LIBRARY  OF  THE 

UNIVERSITY  OF  ILLINOIS 

AT  URBANA-CHAMPAICN 


5/0.84 

no.  (o0\-<o0<o 

Cop-  7~ 


CENTRAL  CIRCULATION  AND  BOOKSTACKS 

The  person  borrowing  this  material  is  re- 
sponsible for  its  renewal  or  return  before 
the  Latest  Date  stamped  below.  You  may 
be  charged  a  minimum  fee  of  $75.00  for 
each  non-returned  or  lost  item. 

Theft,  mutilation,  or  defacement  of  library  material*  can  be 
causes  for  student  disciplinary  action.  All  materials  owned  by 
the  University  of  Illinois  Library  are  the  property  of  the  State 
of  Illinois  and  are  protected  by  Article  16B  of  lllinoit  Criminal 
Law  and  Procedure. 

TO  RENEW,  CALL  (217)  333-8400. 
University  of  Illinois  Library  at  Urbana-Champaign 


f  I  8  200G 


When  renewing  by  phone,  write  new  due  date 
below  previous  due  date.  L162 


Digitized  by  the  Internet  Archive 
in  2013 


http://archive.org/details/pictureanalysisb604masu 


/O.   *? 

%  to2^UIUCDCS-R-73-604 


yyi^iAi 


C00-2118-00U9 


? 


PICTURE  ANALYSIS  BY  GRAPH  TRANSFORMATION 

by 
Ahmad  E .  Masumi 


October  1973 


THE  LIBRARY  OF  THE 

ram 

JIVLKb.lY  OF  ILLINOIS 


DEPARTMENT  OF  COMPUTER  SCIENCE 
UNIVERSITY  OF  ILLINOIS  AT  URBANA-CHAMPAIGN 


URBANA,  ILLINOIS 


UIUCDCS-R-73-604 


PICTURE  ANALYSIS  BY  GRAPH  TRANSFORMATION 

by 
Ahmad  E.  Masumi 


October  1973 


Department  of  Computer  Science 

University  of  Illinois  at  Urbana-Champaign 

Urbana,  Illinois  61801 


partial  fulfillmpnt-  nf  +*1  •  .  AT(ll-l)2ll8  and  submitted  in 


Ill 


ice 


ACKNOWLEDGMENT 

I  extend  my  deepest  gratitude  to  my  thesis  advisor,  Professor  Bruc 
H.  McCormick,  who  in  spite  of  his  departure  from  the  Department  of  Computer 
Science  and  heavy  new  responsibilities  as  the  head  of  Information  Engineering 
Department  of  the  University  of  Illinois  at  Chicago  Circle,  has  been  the 
prime  contributor  to  the  completion  of  this  thesis. 

I  also  want  to  extend  my  cordial  regards  to  Professor  J.  N.  Snyder, 
the  head  of  our  department,  for  his  understanding  and  provision  of  necessary 
funds  for  this  research. 

My  best  regards  are  also  extended  to  Mrs.  Judy  Rudicil  who  has  been 
extremely  patient  with  my  hand  writing  and  has  typed  a  fine  thesis.   Many 
thanks  to  Stanley  Zundo  who  raced  with  time  to  finish  the  figures  and 
drawings  on  time. 


IV 


TABLE  OF  CONTENTS 

Page 


INTRODUCTION  

1.1  Relevant  Background  .  .  .  , 

1.2  Formulation  of  the  Problem 


2.   GRAPHICAL  REPRESENTATION  OF  PICTURES 12 

2.1  Regions  as  Graph  Nodes 15 

2.1.1  Vertex-primitive -,6 

2.1.2  Arc-primitive 2n 

2.1.3  Region  primitives   25 

2.2  Relations  as  Graph  Branches !..".*!  28 

2.2.1  Containment  relation  29 

2.2.2  Neighborhood  relations  33 

2.3  Graph  Definition   ~6 

2.4  Linguistic  Description   38 

3.   GRAPH  TRANSFORMATIONS   ... 

41 

3.1  Domain  and  Range  of  Transformation 41 

3.2  Parsing  vs.  Transformation .'.'.'  44 

3.3  Validity  of  Embedding  Relations  .  48 

3.4  One  Class  of  Relations ..".".  52 

3.4.1  Embedding  function  for  this  class  of  relation'  '.             52 

3.4.2  Other  useful  operations  on  this  set  of  relations  58 

4.  MODELS  AND  PARSING  

61 

4.1  Graph  Structure  Definitions  64 

4.2  Parsing  the  Graphical  Representation  of  the  Scene  .'  .'  .'      70 

4.2.1  Relation  preprocessing  75 

4.2.2  Nodal   class  assignment   75 

4.2.3  Selection  of  an  attention  point .'  75 

4.2.4  Search  of  domains  for  transformations   .....*  77 

4.2.5  Actual  parsing  of  a  found  domain '      79 

4.2.6  Back-up  procedure   on 

4.2.7  Heuristics g0 

4.3  Best-Match  Feature  of  the  Recognizer  ......  82 

4.4  Other  Useful  Transformations .    .    .  86 

4.4.1   Rotational  transformations  87 

5.  APPLICATION 

89 

5.1  Primitive  Classes  fig 

5.1.1  Shape  attribute 90 

5.1.2  Compactness  attribute   ....  Qn 

5.2  Relations _  ™ 

5.3  Models  and  Graph  Structure .    .    .    .  93 

5.4  Careful  Analysis  of  an  Example   ......  102 


5.5  Other  Features  of  the  Recognizer 118 

5.5.1  Recognition  of  incomplete  objects   118 

5.5.2  Scenes  with  varieties  of  the  same  object  ....  124 

5.5.3  Preprocessing  of  the  scene  graph 130 

5.5.4  Recognition  of  the  different  views  of  an  object.  136 

5.6  Observations 143 

6.  LEARNING 146 

6.1  Addition  or  Deletion  of  Objects  from  the  Universe  .  .  .  149 

6.2  Saving  the  Incomplete  Domains  of  the  Rules 152 

7.  CONCLUSIONS  AND  SUGGESTIONS  FOR  FUTURE  WORK 153 

7.1  Parallel  Processing  154 

7.2  Occluded  Objects   I55 

7.3  Relational  Files „ I57 

LIST  OF  REFERENCES 158 

APPENDICES 164 

VITA 210 


1.   INTRODUCTION 

Our  long  range  goal  is  to  develop  a  system  which  is  able  to  analyze  its 
visual   environment  —  in  other  words,  a  system  which  is  able  to  see--and 
functions  intelligently  based  on  the  extracted  information  from  the  scenes. 
Our  immediate  objective  in  this  work  is  to  establish  and  identify  the 
integral  parts  of  an  artificial  system  capable  of  visual  perception.   We 
mainly  concern  ourselves  with  global  aspects  of  processing,  i.e.  some  pre- 
processing is  assumed  on  the  picture  before  it  is  treated  by  this  model. 
These  processes  typically  consist  of  operations  which  can  abstract  low-level 
primitive  elements  as  well  as  find  relations  between  the  elements  of  the 
scene . 

The  deduction  is  accomplished  by  a  graph  structure.   Due  to  the  fact  that 
we  cannot  describe  a  picture  in  terms  of  strings  of  subpictures  (but  for  a 
few  exceptional  classes  of  pictures),  phrase-structure  grammars  cannot  be 
used  directly.   The  rewriting  rules  must  act  on  more  general  entities  such  as 
arrays,  drawings,  labeled  graphs  (webs),  multigraphs,  etc.   For  example, 
Kirsch  (1964)  and  Dacey  (1967)  designed  a  grammar  for  two-dimensional 
languages  where  the  generating  rules  act  on  arrays.   Pflatz  and  Rosenfeld 
(1969)  used  for  picture  description  the  so-called  web  grammars  in  which  the 
rules  act  on  labeled  directed  graphs.  Simply,  in  a  picture  grammar  one  tri. 
to  replace  the  rigid  ordering  of  symbols  by  partial  ordering  of  graph  struc- 
ture so  that  the  parsing  can  still  work. 

*■■  1  Relevant  Background 

A  1966  collection  of  papers  edited  by  Leonard  Uhr  [l]  deals  with  various 
problems  of  pattern  recognition  by  computers.   However,  many  of  the  more 
crucial  problems  such  as  how  humans  encode  shape  information  detect  near 


similarity  or  dissimilarity  of  shapes,  focus  attention  upon  particular 
aspects  of  shapes,  extract  global  information  from  acquired  local  informa- 
tion, and  achieve  perceptual  invariance  are  not  yet  adequately  understood. 

In  processing  pictures  with  the  help  of  computers,  the  problem  that  has 
been  most  extensively  studied  is  the  recognition  problem.   Typically,  this 
has  posed  as  a  problem  in  categorization  as  follows:   given  a  finite  set  of 
picture  prototypes  and  a  token  of  one  of  the  prototypes,  the  task  is  to 
assign  the  token  to  the  correct  prototype.   Attempts  to  solve  this  problem 
have  traditionally  been  decision- theoretic  in  their  approaches.   With  each 
prototype  is  associated  a  list  of  attributes  and  with  each  attribute  a  set 
of  values.   The  space  of  attribute  values  is  then  partitioned  into  mutually 
disjoint  regions  and  each  region  is  assigned  to  one  of  the  prototypes.   Given 
a  token  now,  its  attribute  values  are  computed  and  using  these  computed 
values  one  determines  to  which  region  in  the  property  space  it  belongs. 
Accordingly,  the  input  token  is  categorized  as  belonging  to  one  or  another 

prototype . 

Michalski  [2]  defines  a  variable  valued  logic  system  which  is  capable 
of  this  kind  of  categorization.   In  this  approach,  the  attributes  must  have 
discrete  values  or  the  range  of  values  divided  to  discrete  intervals.   This 
procedure  treats  each  prototype  and  token  as  a  single  atomic  entity.   The 
methodology  of  attribute  assignments  and  attribute  value  computations  is 
guided  primarily  by  the  desire  to  optimize  the  partitioning  of  the  property 
space  and  to  devise  inference  techniques  to  make  minimal  error  decisions  on 
the  basis  of  statistics  computed.   No  other  aspects  of  picture  description 
or  analysis  play  any  role  in  this  approach. 

A  more  careful  study  and  critical  analysis  of  earlier  works  would  show 
certain  inherent  inadequacies  in  this  general  methodology.   These 


inadequacies  relate  to  the  fact  that  in  dealing  with  pictures  (or,  more 
properly,  classes  of  pictures)  the  really  relevant  and  significant  problems 
are  concerned  with  picture  description  and  its  analysis.   Picture  recognition 
is  actually  only  one  aspect  of  this  larger  problem  of  picture  analysis. 
Hence,  an  adequate  framework  for  coping  with  the  recognition  problem  in  its 
generality  must  be  capable  of  analyzing  the  input  picture  and  generating  a 
structured  description  of  it,  and  not  be  restricted  merely  to  making  a 
"YES,"  "NO,"  "DONT  KNOW"  decision. 

We  emphasize  that  the  argument  here  is  not  that  the  classif icatory 
schemata  and  descriptive  schemata  are  mutually  exclusive,  or  otherwise  incom- 
patible, in  solving  the  recognition  problem.   Rather,  our  approach  will 
positively  incorporate  this  classificatory  information  in  a  recognition 
technique  based  on  the  structural  description  of  the  input  picture.   A 
classificatory  technique  based  on  property  lists  is  a  degenerate  description 
with  the  structural  information  missing. 

Abstraction  of  information  is  an  essential  part  of  the  activities  that 
any  intelligent  system  would  have  to  perform  in  the  course  of  analysis  and 
recognition  of  pictures.   A  picture  may  be  regarded  as  a  function  over  some 
two-dimensional  domain.   By  appropriate  choice  of  sampling  grid  size  and 
quantization  levels,  any  picture  function  can  be  regarded  as  indistinguishable 
from  a  m  x  n  array  with  elements  p. .  in  some  bounded  range  [0,  2   ].   This 
array  is  a  "faithful  description"  of  the  original  scene  in  the  sense  that 
both  have  virtually  the  same  information  content.   Picture  processing  research 
is  largely  concerned  with  the  effective  transmission  and  analysis  of  these 
digital  descriptions.   In  many  contexts,  however,  it  is  preferred  to  have  a 
description  that  is  not  faithful,  but  nevertheless  reflects  the  "essential" 
information  in  the  scene  relative  to  some  problem  context. 


Among  this  category  is  the  work  of  Maruyama  [3],  where  he  represents 
several  region  finding  algorithms,  and  several  shape  representation  tech- 
niques which  easily  lend  themselves  to  feature  extraction.   This  abstracted 
information  is   in  turn  used  in  the  global  analysis.   In  [4]  Jayaramamurthy 
describes  different  methods  of  describing  and  analyzing  textures,  and  in  a 
recent  article  Rosenfeld  [5]  described  how  pictures  could  be  divided  into 
subregions  through  texture  analysis. 

Based  on  these  and  other  successful  attempts  in  low  level  picture 
processing,  it  is  apparent  that  we  need  a  global  picture  analyzer   which  will 
try  to  use  all  local  information  and  features  present  in  the  scene. 

Fundamental  to  the  development  of  higher  level  picture  processing  pro- 
cedures is  the  creation  of  a  suitable  picture  representation  for  algorithms 
and  data.   This  representation  should  express  the  hierarchical  structures  of 
elements  with  attributes  and  relations  among  them.   Basic  to  this  model  is 
a  graph-structured  data  representation  and  graph  transformational  passing 
procedures,  which  are  central  issues  in  our  work.   The  graph  structural  repre- 
sentation model  has  the  following  features  which  facilitates  flexibility: 

(1)  Attribute  values  associated  with  each  element  may  be  used  to  tie  down 
the  model  to  concrete  instances. 

(2)  Interaction  or  propagation  of  information  between  parsing  levels,  which 
is  needed  to  identify  objects  in  context,  is  readily  expressible. 

Context  sensitivity  in  pictures  is  best  expressed  by  Guzman  in  [6]  as 

follows  : 

"Given  that  a  set  is  formed  by  components  that  locally  (by 
their  shape,  for  example)  are  ambiguous,  because  they  can  have  one  of 
several  values  (0  -  sun,  ball,  eye,  hole,  etc.)  or  meanings,  can  we  make 
use  of  context  information  (^Q^  occurs  often)  Stated  in  the  ^  °f 


models  in  order  to  assign  to  each  component  a  value  that  the  whole 
set  is  consistent  or  makes  global  sense?" 

Indeed,  as  we  have  found  out,  this  context  sensitivity  plays  a  central 
role  in  parsing  pictures.   At  any  level  in  parsing,  although  the  pieces  match 
locally,  the  global  match  can  be  rejected  because  of  contextual  differences 
(relations).   For  example,  if  a  composite  object  containing  constituents  A 
and  B  is  to  be  formed;  it  will  be  the  case  that  additional  relations  (con- 
straints) between  A  and  B  are  to  be  met. 

1.2   Formulation  of  the  Problem 

In  this  thesis,  the  scene  is  represented  as  a  graph,  where  nodes  corres- 
pond to  the  primitive  regions  of  the  scene  and  branches  are  the  existing 
relations  between  these  regions.   A  descriptive  pattern  analysis  attempts  to 
build  description  of  patterns.   We  refer  to  the  procedures  which  form  the 
core  of  this  analysis  as  parsing  procedures.   The  central  problems  attacked 
by  these  procedures  are  the  location,  isolation,  and  identification  of  objects 
in  a  picture.   The  current  lack  of  computational  sophistication  in  attacking 
these  problems  is  attested  to  by  elementary  (by  human  standards)  image 
processing  that  systems  today  are  able  to  perform. 

This  graph  representation  of  scenes  facilitates  recognition  procedures 
based  on  graph  transformations.   The  recognition  process  is  viewed  as  the 
application  of  proper  replacement  rules  to  the  graphical  representation  of 
the  scene.   In  addition  the  imposition  of  a  partial  ordering  on  the  graphical 
model  of  the  objects  known  to  the  recognizer,  facilitates  the  automatic  infer- 
ence of  these  replacement  rules. 

Other  operations  like  creating  a  branch  or  merging  two  nodes,  which  may 
be  dictated  by  the  current  picture  segmentation  strategy,  etc.,  are  required 


in  search  for  possible  domains  to  apply  graph  transformations  inferred  by 
graph  structural  model  system. 

The  development  of  a  suitable  processing  language  has  been  a  necessary 
part  of  the  solution  to  this  problem.   Currently  adequate  languages  exist  to 
express  algorithms  which  operate  on  the  array  representation  of  pictures  as 
binary  valued  elements  with  neighborhood  connectivity  relationships.   The 
next  and  most  natural  abstraction  from  the  array  representation  is  a  direct 
graph  structure  representation,  by  means  of  which  many  scene  segmentation 
algorithms  can  be  simply  expressed.   This  language  must  have  the  operations 
necessary  to  implement  structure  transformations. 

A  structure  operational  language  has  been  our  choice  for  precisely 
specifying  and  experimenting  with  heuristic  picture  processing  strategies  in 
this  research. 

Languages  to  implement  picture  processing  algorithms  are  divided  into 
two  categories;  descriptive  graphical  languages  and  graph  structure  process- 
ing languages.   Descriptive  graphical  languages  have  tended  to  be  display- 
oriented  and  limited  use  in  recognition  of  pictures.   In  this  class  are  the 
systems  and  languages  of  Herzog  [7],  Kulsrud   and  Williams  [8].   In  Schwebel 
[9J,  some  criteria  for  graphic  languages  are  presented  and  a  language,  ICON, 
to  meet  these  criteria  is  defined.   Graph  structure  processing  languages  may 
be  distinguished  by  the  presence  of  operations  which  allow  analysis  of  descrip- 
tions.  Chase  [lOJ  uses  a  system  for  graph  manipulation.   Graph  structures 
of  greater  generality  can  be  treated  with  languages  given  in  Pratt  and 
Friedman  [ll],  Early  [12],  Lieberman  [13],  Wolfberg  [14],  and  Crespi- 
Reghiggi  and  Marpugo  [l5].   Webs  have  been  used  by  J.  L.  Pflatz  [l6]  in 
describing  some  classes  of  pictures,  which  have  made  the  recognition  of 
global  patterns  feasible. 


The  following  requirements  which  are  essential  to  any  general  graph- 
structural  operation  language  have  led  us  to  select  SOL,  which  has  been 
originally  proposed  by  J.  C.  Schwebel  [17].   These  requirements  are  listed 
in  two  parts:   first  the  elements  of  the  structure  and  secondly  the  require- 
ments of  the  elements  necessitated  by  the  dynamic  nature  of  the  processing. 
Static  Requirements 
Basic  elements 

nodes  (picture  primitives  or  higher  level  objects), 
branches  (relations  between  pairs  of  nodes), 
graphs  (collection  of  nodes  and  branches), 
node  attributes  and  their  values, 
branch  attributes  and  their  values, 
graph  attributes  and  their  values. 
Sets  of  elements 

arbitrary  sets  of  nodes  and  branches, 
arbitrary  sets  of  graphs, 
graph  levels  (explicit  substructures), 
pointers  to  elements, 

attributes  and  values  for  sets  of  elements. 
Dynamic  Requirements 

add-delete  operations, 

functions  on  attribute   values, 

pointer  move  operations, 

node  operations, 

set  operations  (Boolean), 

structure  replacement  operations, 


NAME  and  TYPE  functions, 
TAIL  and  HEAD  functions, 
subgraph  operations. 

The  SOL  language  with  some  modification  at  conceptual  level  has  been 
implemented.   The  language  is  implemented  by  embedding  it  in  ptyl  which  is 
basically  a  procedural  language.   There  are  some  inherent  inefficiencies  in 
list  processing  capability  of  Pi/1,  but  this  language  has  been  proved  excel- 
lent for  our  demonstrative  purposes. 

The  general  organization  of  this  thesis  is  as  follows:   in  chapter  two 
we  introduce  different  existing  methods  of  how  pictures  can  be  repre- 
sented in  a  graphical  form,  and  how  the  abstracted  lower  level  information 
is  propagated  to  this  graph.   Generally,  the  attribute  values  of  graphical 
elements  are  passed  to  the  graph  by  a  set  of  feature  extraction  procedures,  and 
are  used  as  semantic  classif icatory  information  to  guide  the  recognizer.   We 
also  lay  the  basic  definition  of  a  graph  structure,  which  is  the  essential 
tool  of  representing  the  model  of  our  universe. 

In  chapter  three  the  theoretical  ground  rules  of  graph  transformations 
are  discussed.   Here,  we  show  the  factors  which  affect  the  choice  of  domain 
for  transformations,  and  further  by  introducing  a  useful  class  of  relations, 
we  show  the  context  sensitivity  of  these  transformations,  and  provide  rules 
that  are  to  be  satisfied  for  the  feasibility  of  transformations.   We  also  show 
that  our  suggested  set  of  relations  are  closed  under  some  useful  operations. 
In  [17]  Schwebel  has  set  rules  for  general  binary  relations,  but  his  rules 
are  too  restrictive  to  be  useful  in  actual  scene  analysis. 

In  chapter  four  we  introduce  the  graph  structural  representation  of  the 
universe's  model  and  body  of  knowledge.   In  our  model,  the  rules  represent 
the  skeleton  of  the  objects  to  be  encountered  in  our  universe.   In  developing 


the  formal  description  of  the  system  and  parsing  procedure,  we  also  show 
the  similarity  with  human  perception  and  the  psychological  concept  of 
attention.   We  have  further  discussed  the  use  of  associated  attributes  as 
heuristical  means  of  speeding  up  the  recognition  process  and  propagation  of 
knowledge  from  attention  point  to  higher  level  objects.   Here,  the  set  of 
operations  defined  on  the  set  of  relations  are  extended  to  operate  on  the 
graph  structure,  and  it  is  shown  how  a  much  larger  class  of  objects  can  be 
recognized  using  these  operations.   The  set  of  graphs  (rules)  in  the  graph 
structure  form  a  partially  ordered  set  and  the  immediate  successors  of  any 
graph  can  also  be  ordered  according  to  some  criterion  (frequency  of  occur- 
rence, etc.).   A  few  aspects  of  parallel  processing  are  also  discussed  in 
this  chapter. 

In  chapter  five  we  have  introduced  our  experimental  universe.  Graph 
structure  representation  of  this  universe  and  associated  parsing  procedures 
have  been  written  in  SOL.  The  computer  results  of  this  experimentation  are 
tabulated  and  presented  here  in  this  chapter.  The  application  of  this 
general  methodology  of  recognition  to  this  simple  class  of  pictures,  namely 
"simplified  cartoons  in  coloring  books",  has  led  us  to  the  development  of  a 
set  of  useful  concepts  like  best-match  recognition  of  incomplete  objects. 

In  chapter  six  the  concept  of  learning  is  investigated  and  we  will  show 
that  it  is  rather  straightforward  to  incorporate  it  in  our  recognition 
system.   The  learning  aspect  of  artificial  intelligence  is  of  extreme 
importance.   "Learning"  is  a  terminology  which  has  been  broadly  discussed 
and  disputed  by  many  psychologists.   We  define  "learning"  as: 

"Ability  to  recognize  the  incomplete  objects,  introduce  new 

objects  to  the  universe,  modify  the  conception  of  known  objects, 

and  enhance  the  performance  of  recognition  through  experience." 


10 


Many  modern  string  language  compilers  have  the  capability  of  detecting 
and  correcting  the  errors  in  the  input  stream.   But,  this  is  not  considered 
to  be  learning  capability,  because  this  is  just  a  fixed  automation,  and  the 
performance  would  be  the  same  from  one  run  to  another.   Also  it  is  difficult 
to  introduce  new  patterns  to  the  language  grammar  or  modify  the  existing 
patterns  to  enable  the  system  to  accept  broader  class  of  patterns.   These 
are  incorporated  in  our  system  in  the  following  manner. 

The  ability  to  recognize  an  incomplete  object  is  tackled  by  the  "best- 
match  concept".   That  is,  we  can  find  the  object  which  is  closest  in  structure 
and  semantics  to  the  image  and,  if  we  are  in  "learning-mode"  this  information 
is  saved  in  the  "artificial  brain." 

The  ability  to  introduce  a  new  object  is  implemented  by  saving  the 
structural  and  semantical  information  of  the  image  as  initial  conception  of 
this  object,  which  is  of  course  subject  to  modification  through  experiment 
of  other  instances  of  the  same  object. 

As  for  modification  of  the  conceptions,  we  reflect  the  conceived  struc- 
tural and  semantical  variations  of  the  object  in  a  temporary  storage,  and 
use  some  learning  criterion  to  reflect  these  into  permanent  memory. 

It  is  clear  that  this  additional  information  will  enhance  the  system's 
performance,  both  in  enlarging  the  class  of  recognizable  objects  and  in 
speeding  up  the  recognition.   We  have  used  a  scoring  system  as  weighed 
branches  between  the  objects  in  the  graph  structure  which  speeds  up  the 
recognition  of  more  frequently  occurring  combinations. 

In  chapter  seven  we  have  explained  the  generality  of  our  methodology. 
Although  stiff  restrictions  have  made  our  demonstration  unattractive  in 
immediate  pragmatic  applications,  it  is  argued  that  once  the  other  parts  of 
the  over  all  system,  which  are  being  extensively  investigated  or  already 


11 

accomplished,  are  available  this  methodology  will  immediately  emerge  as  a 
practical  useful  tool  in  artificial  intelligence.   We  also  discuss  the  other 
interesting  features  of  this  methodology,  which  we  have  barely  touched  upon 
in  the  course  of  this  research.   It  is  also  my  conviction  that  in  long  run 
experiments  with  this  kind  of  system  will  lead  to  the  creation  of  intelligent 
data  bases  which  are  relational  in  nature. 

In  Appendix  A  we  have  represented  the  definition  of  SOL  and  discuss 
its  implementation. 

In  Appendix  B  a  sample  SOL  program  which  represents  an  input  scene 
image  is  given. 


12 


2.   GRAPHICAL  REPRESENTATION  OF  PICTURES 

Our  prime  objective  in  this  chapter  is  to  demonstrate  that  the  infor- 
mation present  in  a  scene  is  best  described  in  a  graphical  form,  and  further 
show  the  generality  of  region-node  representation.   J.  L.  Pfaltz  in  [l6J 
states : 

"A  picture,  photograph,  or  scene  may  be  regarded  as  a  function 

p(x,y)  over  some  two  dimensional  domain.   This  is  a  'faithful  descrip- 
tion' of  the  original  scene  in  the  sense  that  both  have  virtually  the  same 
information  content.   Picture  processing  research  is  largely  concerned  with 
the  effective  transmission  and  analysis  of  these  digital  descriptions. 

In  many  contexts,  however,  one  would  prefer  a  description  that  is 
not  faithful,  but  that  nevertheless  reflects  the  'essential'  information  in 
the  scene  relative  to  some  problem  context." 

For  example,  the  strings  "an  airplane  is  hidden  in  the  woods,"  or  "a  boy 
is  playing  with  a  ball,"  may  be  far  more  adequate  (and  economical)  descrip- 
tions than  equivalent  gray  level  description  of  digitized  pictures.   The 
implication  is  that  in  many  actual  situations  there  may  be  linguistic  struc- 
tures which 

(1)  are  adequate  descriptions  of  the  pictures, 

(2)  present  the  "essential"  information  context  of  the  picture  in  a 
more  readily  usable  form, 

(3)  are  more  economical  in  terms  of  storage  and  transmission  require- 
ments, in  that  redundant  information  has  been  suppressed, 

(4)  have  a  built-in  hierarchical  structure, 

(5)  make  the  addition  and  deletion  of  objects  easy. 


13 


We  may  formalize  the  idea  of  a  picture  description  as  follows:   Let  P 
denote  a  class  of  pictures.   By  a  picture  description  language  (PDL  [18])  for 
P,  we  mean  a  set  £   of  linguistic  structures  such  that  for  each  p  €  P  there 
exists  an  Lp  U  called  the  description  of  p.   Thus  we  have  a  picture  descrip- 
tion function  (PDF),  call  it  D,  mapping  P  into  £.      In  general  there  will  be 
cases  where  Pl>p2  €  P  are  distinct,  but  D(P]L)  =  D(p2)  in  £.      Differences 
between  p^  and  p2  in  such  a  case  are  considered  to  be  "noise"  with  respect 
to  D.   There  are  also  cases  where  L  ,L'  €  <£  are  distinct,  but  they  are  both 
descriptions  of  the  same  object.   In  this  case,  the  language  £   is  said  to  be 
semantically  ambiguous. 

In  the  examples  like  "this  is  a  picture  of  a  house,"  the  picture  descrip- 
tions are  all  strings  of  symbols.   This  is  natural,  since  humans  usually 
use  linear  linguistic  structures  for  communication.   But  the  string-like 
structure  of  natural  languages  seem  to  be  inappropriate,  or  at  least  ineffi- 
cient, for  general  picture-description  purposes.   Consider,  for  example,  the 
picture  shown  in  Fig.  2.1  and  its  English  description.   This  English  language 
description  is  certainly  not  the  only  one  possible,  nor  is  it  optimal  in 
any  sense;  nevertheless,  it  may  serve  as  an  illustration.   In  particular,  we 
notice  that  we  have  singled  out  and  identified  basic  objects  as  "atoms," 
which  are  considered  to  be  picture  primitives;  have  identified  certain 
properties  of  objects  (size,  shape,  color,  texture,  etc.);  have  identified 
certain  relations  between  objects  (relative  positions);  and  have  indicated 
which  objects  have  which  properties,  and  which  objects  enter  into  which 
relationships . 

In  contrast  to  this  English  description,  consider  the  linguistic  struc- 
ture of  Fig.  2.2  as  a  description  of  the  picture  in  Fig.  2.1.   Readily, 
Fig.  2.2  conveys  the  same  information  about  the  picture,  but  it  is  in  more 


14 


■ 

The  picture  consists  of  2  squares,  2  rectangles,  and  1  trapezoid;  one 
of  the  squares  has  texture  #Tl5  the  other  one  has  texture  #TQ.   The  big 
square  "contains"  the  big  rectangle  and  the  small  square,  the  trapezoid 
is  directly  above  the  big  square;  etc. 

Fig.  2.1.   A  picture  and  its  English  description. 


ATTRIBUTES  > 


<PICTURE> 


<OBJECTS> 

/41-X 


SHAPE  SIZE   •  •  •    <TEXTURE> 


<RELATIONS> 


SQUARE     CIRCLE 


#T0    *Ti     •  •   • 


<Pi>  <P2> 


CONTAINS         ABOVE 


<Pi>  <f?+i> 


Fig.  2.2.   A  labeled  graph  description  of  Fig.  2.1 


15 

usable  form.   It  is  immediately  implementable  as  a  linked  data  structure 
in  which  one  can  quickly  determine  which  objects  are  square,  what  shape  or 
size  each  is,  etc.   Of  course  the  linguistic  structure  of  Fig.  2.2  is  merely 
a  simple  alternative  to  linear  picture  description  languages,  and  was  stated 
to  show  the  existence  of  these  alternatives. 

2  . 1  Regions  as  Graph  Nodes 

We  are  convinced  that  the  graphs  are  the  best  tools  to  describe  the 
pictorial  information;  here  we  investigate  the  conversion  of  pictorial  infor- 
mation into  graphical  structures.   Graphs  are  widely  used  for  the  convenient 
representation  of  the  geometric  relations  implicit  in  a  picture  consisting 
of  many  regions.   By  using  a  graph  representation,  many  problems  of  picture 
analysis  become  feasible  with  provision  of  appropriate  heuristic  algorithms. 
The  study  of  [l9]  Guzman  (1968)  demonstrated  the  significance  of  graphical 
representation  of  scenes  of  simple  3-dimensional  shapes,  using  this  repre- 
sentation to  analyze  and  cluster  planar  regions  into  three-dimensional 
objects  by  extracting  certain  properties  on  the  vertices.   Eastman  [20]  (1970) 
finds  a  need  for  a  better  descriptive-capability  to  define  the  spatial  rela- 
tionships essential  for  two-dimensional  space  planning.   Here  again  a  graph 
representation  is  evoked. 

It  is  obvious  that  there  are  many  means  to  convert  a  picture  to  a  graph, 
even  with  fixed  criteria  of  what  information  should  be  preserved.    The  best 
way  to  arrive  at  our  method  of  conversion  is  to  investigate  different  alter- 
natives of  conversion  of  a  simple  line  drawing   to  a  graph  and  its  proposed 
descriptional  language. 

Let  us  construct  graphs  for  the  picture  of  Fig.  2.3(a).   Since  each 
graph  is  essentially  a  collection  of  nodes  and  a  set  of  branches  among  them, 


16 


the  first  question  to  be  answered  is,  "What  physical  parts  of  the  picture 
should  be  corresponded  to  the  main  abstract  entities  of  the  graph,  namely 
nodes?"   There  are  three  possible  answers  to  this  question  which  we  will 
investigate  in  turn. 

Of  the  five  criteria  which  we  set  as  a  judgment  of  good  picture 
description,  the  first  three  are  satisfied  by  having  a  graphical  description 
language.   So  we  have  to  base  our  judgment  for  selecting  the  best  answer  on 
the  last  two  criteria. 

2.1.1   Vert ex- primitive 

The  first  choice  would  be  to  have  the  geometrical  topology  of  the  graph 
directly  correspond  to  that  of  the  line  drawing.   Usually  graphs  are  described 
in  various  matrix  forms  or  in  a  sequential  manner.   None  of  these  descrip- 
tions are  flexible  and  powerful  enough  for  our  purposes.   However,  for 
illustrative  purpose  it  would  be  useful  to  present  a  sequential  representation 
of  graph  2.3(b)  similar  to  a  method  by  Maruyama  in  [3].   Assume  each  node, 
i.e.  intersection  of  more  than  two  arcs,  is  denoted  by  the  values  of  its  x,y 

coordinates . 

Let  us  assume  the  following  expression  for  each  node: 

<V>({<a.>(<R,  >V>)}*)  , 


i   k 


where , 

V: :   =  node  name 


V  • •  =  node  name  different  from  V 
k 


a  .  : 

J 


R.  : 

i 


=  arc  name 


=  region  name, 
the  following  semantics  are  implied: 


17 


S-    Rl 

Ji 

a2 

V3' 

R2 

a* 

T~ 

v4« 

V2  Ro 


(a)      picture    of    five    regions  (b)      vertex-node    (naming) 


Vi,V2 


Vyf 

\v3 

j/v3»v4 

£\. 

v3 

V  ^^\ 

/"** 

\     *  \ 

/<*?  / 

(d)      region-node 


(c)      arc-node 


Fig.  2.3.   Example  of  graph  representation  of  a  picture, 


18 


1)  nodes  V  '  s  are  direct  neighbors  of  the  node  V, 

k 

2)  nodes  V  and  V  are  connected  with  an  arc  a  , 

K.  J 

3)  region  R.  is  (partly)  bounded  by  arc  a.,  and  R.  is  assumed  by  a   in 
counter-clockwise  direction  at  node  V,  and 

4)  the  sequence  of  R.'s  is  cyclic. 

By  the  application  of  the  above  representation,  the  picture  of  Fig. 
2.3(a)  can  be  described  as  follows: 

V1(a1(R0,V2),a7(R2,V4)Ja2(R1,V2)) 
V2(a1(R1,V1),a2(R2,V1)5a3(R0,V3)) 
V3(a3(R2,V2),a6(R3,V4),a5(R4,V4),a6(R05V4)) 
V4(a7(R03V1),a6(R4,V3)sa5(R3,V3)3a6(R2>V3))  . 

In  the  above  expressions,  arc  a.'s  are  sequentially  scanned  in  counter- 
clockwise order  about  node  V.   Thus  the  sequence  of  a..  '  s  is  cyclic.   For 
example,  the  following  two  sequences  are  semantically  identical: 

V1(a1(R0,V2),a7(R2,V4),a2(R1,V2)) 

V1(a7(R2,V4),a2(R1,V2),R1(R0,V2))  . 

As  we  observed  in  this  case,  sequential  representations  are  bulky  and 
often  semantically  ambiguous.   In  cases  where  arcs  do  not  convey  shape  infor- 
mation and  are  simply  straight  lines,  we  can  eliminate  them  from  expressions. 
But  still  the  most  natural  linguistic  description  would  be  the  one  which 
preserves  the  exact  graphical  structure  of  the  picture,  Fig.  2.4(a). 

Lack  of  hierarchy  and  difficulty  in  adding  or  deleting  parts  to  or  from 
the  picture  makes  this  alternative  very  unattractive  for  our  picture  process- 
ing  purposes . 


19 


v4 


(a)   vertex-node 


Vl,V2 


lv2,v3 


v3,v« 


+/ 

\+ 

+      \ 

~*       \ 

"\ 

/* 

v3>v4 


lv3,v4 


(b)   arc-node 
[+-,x,*J  are  the  same  as  in  PDL. 


(c)   region-node 


Fig.  2.4.   Graphical  descriptions  of  line  drawing  in  Fig.  2.3(a) 


20 

2.1.2  Arc-primitive 

The  next  alternative  would  be  to  have  the  arcs  in  the  line  drawing 
correspond  to  our  picture  primitive.   In  [l8j  Alan  C.  Shaw  defines,  MA  pic- 
ture primitive  may  be  any  n  dimensional  pattern  with  two  distinguished 
points,  a  tail  and  a  head.   Primitives  can  be  metrically  concatenated 
together  only  at  their  tail  and  head  points.   Because  the  two  points  of 
possible  concatenation  are  specified,  a  primitive  can  be  represented  as  a 
labeled  directed  edge  of  a  graph,  pointing  from  its  tail  to  head  node."  In 
this  case  the  emphasis  is  still  on  having  the  picture  graph  and  description 
graph  correspond  in  geometrical  topology.   Based  on  this  picture  primitive 
definition,  he  defines  a  picture  description  language  (PDL) .   The  following 
syntax  will  generate  any  sentence  S  in  PDL  (expressed  S  £  PDL): 

S  -  P|(S0S)  |(~S)  |sl|(/sl)  , 
SL  -  S  I (SL0SL) I (~SL) I (/SL)  , 

0  -+|x|-|*  , 

P  -*  [any  primitive  class  name], 

1  -*  {any  label  designator]  . 

For  any  S  €  PDL,  he  defines  P(S)  as  the  set  of  all  pictures  with  descrip- 
tion S.   At  this  level  a  picture  oi   is  described  by  the  pair  T(cf)  =  (T Ape) , 
T  (»)),  where  T  (pi)    €  PDL  and  T  (pi)    is  a  list  of  descriptions  D(P)  of  each 

V  O  V 

primitive  of  the  picture.   Not  only  primitives,  but  all  pictures  have  a  tail 
and  a  head  determined  by  their  descriptions;  concatenations  among  pictures 
can  only  occur  at  their  tail  and  head  positions.   Consider  the  picture  ot   con- 
sisting of  two  subpictures  oi     and  v     such  that  a^   €  PCS^,  <*2  £  P(S2>  and 


21 


Ts(<*)  =  (S^),  S15S2  €  PDL.   Then  the  tail  and  head  of  a   according  to 


Tg(a)  are  defined: 


tail  (a)  =  tail^),   head  (or)  =  head  (a  ). 

In  the  same  manner  as  primitives,  more  complex  pictures  can  thus  also  be 
represented  by  a  directed  edge  of  a  graph.   This  gives  us  the  necessary 
hierarchy  for  picture  processing. 

The  meaning  of  the  binary  concatenation  operators  [+,-,X,*]  is  presented 
below  by  defining  PCS^S^;  it  is  assumed  that  S   S   6  PDL,  a      €  P  (S  )  and 
a2        P^S2^*   The  notation  cat  means  "is  concatenated  onto": 

P((S1+S2))  =  {»L,»2|head(aL)cat  tail (a  )  }, 

P((S1-S2))    =    {ai,a2|head(or1)cat   head(<*   )}, 

P((S1XS2))   =   iava2  ItailC^)  cat   tall(a  )}, 

P((S1*S2))    =    {^.^IctaiK^^cat   tail(o2))   and 

(head(Qf   )cat   head  (Q-   ))}. 

The  graphs  of  the  resulting  pictures  are  illustrated  in  Fig.  2.5;  t  and 
h  indicate  the  tail  and  head  of  each  expression.   Note  that  P(S)  could  be 
empty.   This  is  the  case  when  the  concatenations  described  by  S  are  not 
possible  according  to  the  definitions  of  the  primitives.   For  example,  if 
\    is  a  line  segment  primitive  and  head^)  =  { (x,y)  Jx^  }  and  I     another 
line  segment  primitive  with  tail(^)  =  {x,y|x=C2},  where  Q  ^  Q        then 

P(^  +  ^2)  is  empty.   However,  the  graph  is  constructed  by  treating  £     and  £ 
abstractly. 


-OW 


The  unary  operators  ~  and  /  do  not  describe  concatenations  but  all. 
the  tail  and  head  to  be  moved.   .  is  a  tail/head  reverser  such  that 
tail((-S))  =  head(S)  and  head((~S))  =  tail(S).   The  blanking  or  superposition 


22 


Ts(a) 


.(Sj+Sg) 


t— §^- 


(Si  x  S2) 


(Si-S2) 


(Si  *  s2) 


Fig.  2.5.   Concatenation  operators 


23 

operator,  /,  works  in  conjunction  with  Label  designators  to  allow  multiple 
appearances  of  the  same  primitive  in  a  description,  effectively  relocating 
the  tail  or  head  on  an  expression.   The  label  serves  to  identify  the  primitive 
or  structure  while  the  /  operator  indicates  retracing  over  its  associated 
operand . 

The  class  of  pictures  of  interest  is  described  by  a  grammar  g,  and  the 
description  D(aO  of  any  picture  ot   £  p   is: 


D(°0  =  (<X(Q0,T  (00,(H_  (a),H  (ot)) 


where 


ts(<*)  e  £(g)  , 

T  (at)    is  a  list  of  descriptions  of  all  primitives  of  at } 

Hg(a)  is  the  parse  of  T    (ot)    according  to  g,  and 

H  (at)    is  a  list  of  the  descriptions  of  all  non  terminals  in  H  (ot)  . 

Using  PDL,  the  line  drawing  in  Fig.  2.3(a)  will  be  described  as  the  follow- 
ing string  language  in  Fig.  2.6. 

We  are  able  to  describe  the  same  line  drawing  of  Fig.  2.3(a)  in  a 
different  sentence  than  the  one  in  Fig.  2.6:  namely, 

T'(V  -  ((((-d3)  +  ((d1+(^))>Hd2+(^(/d);)))))Vcd42)*(.(d5,vd5)))  t 

which  is  far  more  complicated  than  the  sentences  in  grammar  defined  for  the 
class  of  this  line  drawing,  and  can  not  be  recognized  as  a  cup  easily.   Again, 
to  solve  this  ambiguity  problem  we  have  to  depart  from  string  languages  and 
resort  to  graphical  languages.   Graph  2.3(c)  is  the  graph  representation  of 
the  cup,  where  the  nodes  are  arc-primitives.   In  Fig.  2.4(b)  we  have  shown 
how  the  structure  of  this  line  drawing  can  be  described  by  a  graph.   Here, 
again,  a  graphical  language  seems  to  be  the  proper  way  of  representing  the 
information  structures  in  the  picture. 


24 


g: 


P    ->  ELLIPSE  |  HANDLE  [CUP 

ELLIPSE 

HANDLE 

CUP   -    ((-d      +  ELLIPSE) *((d4   *(~HANDLE))    +  d^)) 

L(g)  =   {(d  *d2),(d5^5),((^3+<di.*42))*((d4*(-(d5*d5)))+d4))) 


(dL    *   d2) 
(d5    *   d5) 


di 


d< 


PRIMITIVE      CLASSES 


CUP  (a,) 


ts(^l)  =  ((-d3+(d1*d2))*((d4*(~(d5*d5)))-rt4» 


CUP 


d3         dt 


HANDLE 


> 


Fig.  2.6.   Hierarchic  description  of  a  picture. 


25 

As  to  the  choice  of  arcs  as  picture  primitives,  although  the  hierarchi- 
cal representation  is  now  possible,  the  addition  and  deletion  of  objects  to 
the  scene  is  awkward  because  arcs  are  not  semantically  as  rich  as  regions. 
Of  course  we  can  use  this  approach  positively  as  a  preprocessing  stage  to  our 
analysis  in  finding  regions. 

2.1.3   Region  primitives 

The  last  of  picture  primitives,  and  our  choice  is  the  regions,  which 
form  the  "atoms"  of  our  picture  description  language.   These  picture  primi- 
tives correspond  to  the  primitive  nodes  in  our  graph  structure. 

Each  node  of  a  graph  represents  either  a  closed  or  an  open  region  of  a 
picture,  or  can  be  interpreted  to  have  a  graphical  structure  itself.   This 
enables  us  to  have  graphs  as  elements  of  a  graph  and  a  hierarchical  scheme 
of  presentation  and  interpretation  of  scenes.   Regions  are  preferred  to 
arcs  as  primitives,  also  because  of  their  ability  to  have  much  richer  seman- 
tical content,  which  makes  the  addition  and  deletion  of  objects  much  more 
feasible.   As  we  have  mentioned  before,  at  all  levels  of  picture  analysis  it 
is  of  prime  importance  to  have  all  pertinent  lower  level  information  readily 
available  to  answer  questions  and  resolve  ambiguities  through  interrogation 
of  this  information.   We  have  achieved  this  ability  through  association  of 
general  data  structures  with  our  graph  elements.   Node  attributes  carry  the 
semantic  information  about  our  picture  primitives,  and  further  in  a  graph 
transformational  system  they  can  carry  the  semantical  information  which  is 
acquired  through  the  analysis  process.   The  followings  are  a  set  of  attributes 
which  we  have  found  useful  in  picture  processing  applications, 
a)   Shape  information 

Shapes  represent  a  substantial  part  of  semantic    information  about 
pictures.   In  [.3  J  Maruyama  has  discussed  the  way,  which  smoothed  contours  of 


26 


objects  can  be  obtained  from  a  given  black  and  white  picture.   He  has 
introduced  three  basic  shape  representations:   polygonal,  pattern  sequence, 
and  skeleton  representations.   In  our  global  level  analysis,  shape  features 
are  the  most  compact  and  also  generally  sufficient  level  of  abstraction  of 
shapes.   These  features  can  be  easily  extracted  from  Maruyama's  shape  repre- 
sentations.  In  Table  2.1  we  have  shown  a  set  of  useful  shape  features  from 
his  thesis. 

b)  Texture  information 

Visual  texture  is  known  to  play  an  important  role  in  the  field  of  pattern 
recognition.   In  [4]  Murthy  has  represented  textures  as  seasonal  time  series. 
Since  all  we  are  interested  in  is  to  know  the  class  of  textures  from  a  cate- 
gorical point  of  view,  Michalski's  [2]  VL   system  (variable  valued  logic)  is 
an  excellent  tool  for  representing  the  class  of  textures  as  a  VLL  formula, 
and  subsequently  by  imposition  of  windows  over  the  regions  of  the  picture  and 
by  testing  this  VL   formula,  we  can  find  the  texture  of  the  region. 

c)  Color  attribute 

It  is  a  known  fact  that  the  colored  pictures  convey  much  richer  infor- 
mation about  the  environment  than  simple  black  and  white  pictures.   This 
color  information  can  be  associated  with  our  picture  elements  through  simple 
numbering  of  colors.   However,  in  more  sophisticated  systems  color  attribute 
can  be  a  set  of  color  features,  which  can  be  easily  extracted  from  pictures. 

d)  Size  attribute 

Once  the  scale  of  the  picture  is  known,  this  attribute  will  convey  impor- 
tant semantical  information  about  the  relative  size  of  objects  in  the  scene, 
which  is  essential  in  analyzing  the  3-dimensional  objects  with  variable 
orientations.   Of  course  this  attribute  can  be  included  as  a  feature  of  the 


Table  2.1 
Selected  Shape  Features 


27 


Name  Description 

PERIMETER  Perimeter  p  (unnormalized) 

AREA  Area  A  (unnormalized) 

PERIM/AREA  (l-2v^A/p) 

M2  Degree  of  variance 

M3  Degree  of  skewness 

M4  Degree  of  elongation 

MEAN-R  Mean  of  unnormalized  elementary  patterns 

MAXAMP  Difference  between  the  maximum  and  the  minimum  in 
elementary  patterns* 

DEV-R  Deviation  of  elementary  patterns 

ALT#  Number  of  alterations  in  a  pattern  sequence  wrt  the  mean 

MAXIMA L#  Number  of  local  maximal  points  in  a  pattern  sequence 

DEV-E  Deviation  of  edge  lengths 

CONVEXITY  Degree  of  convexity 

SYMMETRY  Degree  of  symmetry 

DEG-AS  Degree  of  angular  symmetry 

DEG-F  Degree  of  feasibility 

DEV-ANG  Deviation  of  angles 

ANG#  Number  of  angles  less  than  7  and  greater  than  ~ 

4  4 

VERTEX#  Number  of  vertices  in  a  polygonal  representation 


28 

shape  attribute,  but  because  of  its  relevant  importance  we  have  brought  this 
up  separately. 

2.2   Relations  as  Graph  Branches 

In  our  representation  a  branch,  directed  or  not  directed,  denotes  a 
relation  between  two  end  nodes.   Fig.  2.3(d)  represents  the  relational  graph 
description  of  Fig.  2.3(a),  and  by  selecting  a  few  familiar  relations,  the 
graph  description  of  this  line  drawing  would  be  as  simple  as  Fig.  2.4(c). 

The  ability  to  associate  any  type  of  data  structure  with  branches  in  a 
relational  graph  has  made  our  method  of  graph  representation  quite  general. 
It  encompasses  all  the  techniques  discussed  earlier.   For  example,  the  physi- 
cal description  of  the  border  between  two  regions,  which  defines  the  relation 
between  these  two  regions  can  be  associated  with  the  branch  representing  this 

relation. 

Among  other  useful  branch  attributes  we  briefly  discuss  the  weight  attri- 
bute.  Weight  attribute  has  its  values  between  [0,l],  which  further  can 
divide  each  type  of  relation  to  infinite  number  of  different  relations.   A  0 
weight  simply  dictates  that  the  branch  is  absent,  and  a  weight  1  tells  us  that 
the  branch  must  exist.   In  other  words  the  weight  attribute  value  can  be 
interpreted  as  the  probability  of  the  branch  existence.   In  a  graph  descrip- 
tional  representation  of  pictures,  labeled  branches  are  mainly  used  to 
transfer  the  positional  relations  of  the  regions  to  the  analyzer.   Although 
it  may  seem  that  a  single  label  information  is  hardly  sufficient  to  represent 
complex  relational  informations,  as  we  mentioned  above,  we  have  the  ability 
to  associate  with  each  branch  any  additional  information  that  we  might  need 
in  different  applications. 

Through  this  research  we  have  discovered  that  the  existence  of  certain 
relations  with  their  known  properties  is  of  a  great  help  in  analyzing  pictures. 


29 

Since  the  actual  extraction  of  these  relations  from  array  representation 
of  the  scene  is  the  task  of  the  preprocessor  and  is  highly  dependent  on  the 
way  each  region  is  presented,  we  do  not  get  into  details  of  how  this  should 
be  done.   Maruyama  [3],  in  connection  with  shape  representations,  has  dis- 
cussed the  extraction  of  several  useful  relations.   To  make  the  material 
self-contained  we  include  here  some  of  his  techniques  for  relation 
extraction. 

2.2.1   Containment  relation 

The  topology  of  a  plain  black  and  white  picture  or  regional  picture  can 
be  simply  described  by  a  tree,  a  containment  tree.   For  example,  the 
topological  relation--"contains,n  ,,inside"--of  pictures  illustrated  in 
Fig.  2.7  can  be  described  simply  by  the  containment  trees  as  illustrated, 
where  each  node  refers  to  a  connected  component  of  the  figure  or  closed 
region.   Thus  such  a  tree  is  a  subgraph  of  the  graph  which  describes  the 
structure  of  the  picture. 

For  array  pictures,  Buneman  (1970)  presented  an  algorithm  which  deter- 
mines this  containment  tree  from  information  about  the  picture  which  is  picked 
up  during  a  single  scan  of  array.   His  algorithm  strongly  resembles  a 
generative  grammar  and  a  procedure  for  choosing  the  rule  of  the  grammar  to 
operate . 

His  algorithm  of  generating  containment  trees  assumes  that  the  given 
picture  is  described  as  a  set  of  polygonal  curves.   He  also  assumes  that  for 
each  such  closed  curve  a  point  inside  the  curve,  the  center  of  gravity  or  the 
AS-point  (angulary  simple),  is  available.   The  procedures  for  deriving 
trees  is  quite  similar  to  the  determination  of  intersections  between  a 
straight  line  and  with  a  set  of  curves.   He  emanates  a  ray  from  each  given 
point  which  is  inside  the  corresponding  region  and  determines  a  sequence  of 


30 


(a)   black  and  white  array  picture 


rc:  CONTAINS 


(b)   region  bounded  picture 


Fig.  2.7.   Containment  trees 


31 

intersections  with  the  ray  and  the  regional  boundaries.   Without  loss  of 

generality,  rays  are  emanated  along  the  X-coordinate .  A  simple  analysis  of  the 

generated  intersection  sequence  shows  the  possible  containment  relations. 

Let  us  consider  the  picture  of  Fig.  2.8,  which  consists  of  five  closed 

regions.   The  regions  are  denoted  by  R  through  R   and  their  interior  points 

are  X^    through  X^,  respectively.   For  example,  region  R  is  contained  in  R 

and,  in  turn,  R2  is  contained  in  R  .   This  can  be  determined  by  emanating  a 

ray  at  X5  and  examining  the  sequence  of  boundary  intersections.   In  this  case, 

the  sequence  is  R^  R^  R   and  is  an  ordered  sequence.   It  is  interpreted 

as  for  each  pair  of  adjacent  elements,  the  left  element  contained  in  the 

right  element.   Thus,  RCR   and  R  CR  . 

Let  us  consider  region  R  .   Its  sequence  is  R   R   R  ,  R  ,  R  .   This 

-J  j    _>    j    2    1 

means  the  emanated  line  at  X3  doubly  crosses  region  R   which  clearly  implies 
that  X3  is  not  contained  in  R^ ;  consequently  R   is  not  contained  in  R  .   Thus 
any  pair  of  identical  elements  should  be  eliminated  from  the  sequence.   After 
this  shrinking  process,  the  sequence  becomes  R   R   R  and  is  interpreted 
as  R^CR2CR^.   The  region  R  has  its  corresponding  sequence  as  R   R  ,  R  . 
However,  since  we  are  considering  and  determining  which  regions  contain  R 
the  final  sequence  should  be  R  . 

From  the  above  examples,  the  following  simple  grammar  is  driven.   Let 
R  be  a  sequence, 

R  =  R   ,R.„,- • • ,R. .,-• .,R. 

il'  i2'      lj5    '  m 

generated  at  a  point  xR  of  the  region  R  .   Let  us  assume  that  symbols  a^y 

denote  any  sequences  including  null  sequence.   Then  the  grammar  consists  of 

the  following  two  rules: 

RL1.   qtr.  .pR.  y  >  cvBy 

RL2.      Let   x,     be    the   point   of   R.  .,    o/R     $     ■>  R     8. 
k  iJ  ij  ij 


32 


Xj  , x2 , x4 


(a) 


V 

R4'R2'R1 

X2: 

R4'R2'R1 

X3: 

R3,R5»R5»R2'Rl 

V 

R4'VR1 

X5: 

R5,R2,RL 

(b) 


R2CR1 
R3CR2 

R4CR2 
R5CR2 


(d) 


X, 


R, 


R2,RL 


Fie.    2.8.      An   algorithm   to   generate    containment   trees 


33 

For  any  generated  subsequence  (or  complete  sequence),  RLl  is  applied  as 
many  times  as  possible,  and  then,  finally,  RL2  is  applied. 

In  Fig.  2.8(b),  generated  sequences  at  X  through  X  are  listed,  (c) 
shows  the  results  obtained  after  the  application  of  rules  RLl,  RL2 ,  and 
final  containment  relations  as  well  as  containment  tree  of  picture  (a)  are 
illustrated  in  (d).   In  (d)  only  direct  containment  relations  are  exhibited, 
and  transitive  containment  relations  such  as  R  CR.  are  eliminated. 

2.2.2   Neighborhood  relations 

Relations,  such  as  "next";  "above"  or  "below",  "above  and  touching"  or 
"below  and  touching";  "left  of"  or  "right  of",  "left  of  and  touching"  or 
"right  of  and  touching";  "behind"  or  "in  front  of",  "behind  and 
touching"  or  "in  front  of  and  touching",  are  considered  to  be  heighborhood 
relations.   These  relations  can  be  defined  only  between  brother  regions, 
where  regions  which  have  an  identical  direct  father  in  the  containment  tree 
are  said  to  be  brothers.   Thus,  after  construction  of  a  containment  tree  for 
a  given  picture,  we  know  which  regions  of  the  picture  are  brothers. 

The  determination  of  relations  between  a  pair  of  neighboring  regions  is 

as  simple  as  the  construction  of  a  containment  tree.   Let  us  assume  that  the 

regions  R. ,R0 , • • • ,R  .  and  R  are  brothers,  and  let  us  assume  that  we  are 
12      j      n 

determining  the  relations  between  a  region  R.  and  the  rest  of  regions 

R  -  {R.  },  where 
J 

R  =  [R^jR^, • • • ,R. , • • • ,Rn j  . 

We  also  assume  that  x.  €  R.  for  j=l,-'*,n.   To  determine  relations,  a  gener- 
alized primal  pattern  sequence  is  generated  about  x.  within  the  space  of 
R"lR.j},  here  each  elementary  pattern  has  as  an  attribute,  the  sequence  of 
regions  which  have  been  pierced.  Thus,  for  this  type  of  pattern  sequences  about  x., 


34 

we  may  ignore  measuring  the  distance  between  x.  and  the  intersection  of 
elementary  pattern  with  a  region;  however,  we  detect  the  region  name  and 
whether  or  not  this  intersection  point  is  on  the  boundary  of  both  regions- 

In  Fig.  2.9,  we  have  illustrated  the  process  of  determining  these  rela- 
tions.  Fig.  2.9(a)  shows  a  picture  of  eight  regions  and  its  containment  tree. 
From  the  tree,  regions  R  ,  R,,  R  are  brothers,  and  their  father  is  R^.   Only 
regions  that  are  brothers  are  illustrated  in  (b),  and  their  possible  relation 
graph  is  exemplified.   In  (c)  a  picture  of  three  regions  is  shown,  and  the 
process  of  determining  relations  between  R  and  R  -  {R2 }  is  illustrated  as 
the  generation  of  primal  pattern  sequence.   From  the  sequence  we  can  deter- 
mine all  possible  relations.   The  sequence  generated  at  X^   is 

S(X„)  =  Sq'^1' * " " '^12 

=  1,1,1,0,0,0,0,0,0,0,0,1,1, 

where  0  denotes  empty  space,  1  denotes  R, .   We  also  have  the  information 
whether  these  intersections  are  on  the  boundary  of  both  regions  or  not.   From 
the  sequence  S (X  )  and  this  information,  it  becomes  clear  that  R2  is  not 
connected  to  R  or  R   and  further  from  the  indices  of  elementary  patterns 
we  derive:   R  is  at  the  left  of  R,  . 

In  general,  in  case  of  two  dimensional  pictures,  S  (X  )  can  be  divided 
into  four  regions. 

s(Xj)  =  sQ,  sL,  •••,  sN/2,  •••,  sN_1 
1 1     I 1 


below        above 

I I     L 1     I 1 

left  of     right  of     left  of 

From  these  illustrations,  it  should  be  clear  that  much  more  sophisticated 
relations  could  be  discovered  by  examining  these  kinds  of  pattern  sequences. 


35 


R, 


Rs 


Ri 


R7 

R5 

R4 
R6 

rc  ■•  CONTAINS 


(a)   A  picture  and  its  containment  tree 


R7 

R5 

R6 

rR   :   RIGHT    OF 
rL  :  LEFT    OF 

(b)   Brotherhood  regions  and  their  relation  graph 


rTT" 


Ri 

• 

Rf 

*i 

S(x2H 


R- 


(c)   S(x2)  =  S0)S1,...,S1, 


S.  €'[r^.r  0] 


Fig.  2.9.   An  algorithm  to  generate  neighborhood  relations 


36 


In  the  case  of  three  dimensional  pictures  we  can  have  another  sequence  of 
rays,  S * (X . ) ,  which  their  plan  is  perpendicular  to  the  plan  of  S(X.). 


s-(x.)  =  sQ',  s^,---,  s^/2,   •••,  s^_x 

1 '  hr-1 

below          above 
.       i       I I      I 


back  of       front  of      back  of 

2 .3   Graph  Definition 

Before  getting  into  formal  definition  of  our  graph-structure,  it  would 
be  in  order  to  mention  WEBS  as  immediate  predecessor  of  our  system.   In  [l6] 
J.  L.  Pfaltz  defines  a  web  as  follows: 

A  directed  graph  G  =    (S,R)  is  a  set  S  of  points  (or  nodes)  together  with 
a  relation  R  on  S.   Recall  that  a  relation  on  a  set  S  is  a  subset  of  SxS , 
that  is  a  set  of  ordered  pairs  {(a,b)|a,b  6  s}.   R  is  usually  called  the  set 
of  arcs  (or  edges)  of  G.   If  there  exists  a  function  X:   S  ->  Vg,  where  Vg  is 
any  set  of  symbols  ("vocabulary")  then  G  is  said  to  be  a  point-labeled  graph 
over  V  .   If  there  exists  a  function  H:   *  -  VR  for  some  set  of  symbols  VR, 
then  G  is  said  to  be  an  edge-labeled  graph  over  VR.   Point  or  edge  labeled 

graphs  are  called  webs. 

Pfaltz  and  Rosenfeld  [2l]  have  shown  that  the  concept  of  a  phrase- 
structure  grammar  can  be  extended  in  a  natural  way  from  strings  to  webs.   In 
our  graphs  we  have  both  nodes  and  branches  labeled. 

Let, 

N  =  fW"'nJ     S6t  °£  n°deS' 
B  =  [b  ,b2,---,b  }    set  of  branches, 

A  -  {a1,a2,---,av}    set  of  attributes, 

U .  _.  fu  jU  ,...,u  }    set  of  labels  for  branches. 


37 


Then  a    graph   g.    is   defined   as: 

g.  =  [n.,b.  |a.,u.] 

G  =  (815g2> ' ' ' 38q)     set  of  graphs 


where, 


N.  CN  , 

B.CB  , 

1     ' 


A.  CA  , 

u.  cu  , 


and  since  each  branch,  b.  =  (n.,u  ,n  A   or 

l         k.  jC 

BCNxUxN  , 
then  N.  must  contain  all  the  nodes  of  the  branches  in  B. .   m  is  the  maximum 
number  of  nodes  allowed  by  implementation. 

Definition:   gi  is  a  subgraph  of  g.  if  and  only  if  g.Cg.- 

j  j- — •  j 

A  is  a  set  of  attributes  for  nodes,  branches,  and  graphs; 

A  =  NaUBa  L)Ga, 
each  a.  is  a  function, 

a.  :   [n|b|g]  >  v.  , 
where,  v.  is  a  set  of  values. 

Equivalently,  we  may  consider  the  branches  as  a  set  of  relations 
{Ru1,Ru2,Ru   ' • ' ,Ru  1   where, 

RuC  NxN  , 
and  a  branch  b.  =  ^^V^j   e*ists  if  and  only  if 

(n.,n£)   Ruk  . 

Let  G  =  igL,g2, • • • ,g  }  represent  a  set  of  graphs,  where  g.  is  defined 
as  above.   The  following  functions  are  defined  to  operate  on  the  elements  of 
the  graph  or  the  graph  themselves. 


38 

TAIL:  B  -*  N  ,  TAIL  (n  .u^n^)  =  n 

HEAD:  B  -  N  ,  HEAD(n  ,uk,np  =  n^ 

LABEL:  B  -*.U  ,  LABEL (n  , uk , n £    =   uk 

NODES:  G  -  N  ,  NOF(gj,)  =  {n  |n   €  g^ } 

BRANCHES:  G  -  B  ,  BOF^)  =  [b^  |b   €  g± } 

ADJBRS:  N  -  2B,  ADJBR^)  =  [b^  |b   =  (n^^.n^)  or  (n^u^ru)} 

INCBRS:  N  -  2B,  INCBRfjO  =  [b^  |b  =  (n^u^i^)  } 

OUTBRS:  N  -  2B,  OUTBRCru)  =  [b  |b   =  (iu  ,1^,1^)3 

NAME:  {nJbJg}  -*  L   where  L  is  a  set  of  labels  and  UCL 

These  are  a  few  of  the  functions  which  operate  on  the  graph  elements.   See 
Appendix  A  about  other  functions. 

2.4   Linguistic  Description 

As  we  have  explained  so  far,  although  string  languages  can  be  useful  in 
preprocessing  stages,  they  are  inadequate  for  global  analysis  of  pictures. 
Our  structure  operation  language  can  be  used  to  declare  graphical  represen- 
tation of  the  scenes  statically,  or  by  the  help  of  preprocessing  procedure 
to  create  graphs  dynamically.   In  Fig.  2.10(a)  we  have  an  example  of  a  scene 
in  which  the  graphical  representation  is  given  pictorially  in  (b)  and 
linguistically  in  (c)  as  a  SOL  declaration.   Here,  the  branch  labels  have 
the  following  interpretations: 
DLOF:   Directly  left  of 
DROF:   Directly  right  of 
CONTAIN:   Contains 
DABOVE:   Directly  above. 
For  example,  N]_  ^t%  ^ ;  means,  region  ^  is  located  directly  on  the  left  of 

region  N„. 


39 


SCENE 


(a)   scene 


DCL  SCENE  GRAPH, 

DLOF  BR  (N1,N3), 
DROF  BR  (N4,N3), 
CONTAIN  BR  (N3,N2), 
DABOVE  BR  (N4,N6), 
DABOVE  BR  (N4,N7), 
DROF  BR  (N6,N7), 
CONTAIN  BR  (N4,N5), 
DLOF  BR  (N4,N8); 


(b)   relational  graph 


Associate  statements  are  used  to 
assign  the  values  of  node  attributes 


(c)   SOL  declaration  of  the 
scene  graph 


Fig.  2.10.   Linguistic  description  of  a  scene, 


40 


As  we  mentioned  before,  there  can  be  any  kind  of  general  data  structures 
associated  with  any  element  of  our  graph,  and  these  are  attributes  which  con- 
vey semantical  information  about  the  picture. 

For  example,  if  attribute  a.  conveys  the  shape  information  about  primi- 
tive regions  of  the  picture  and  it  has  the  following  PLl  structure, 
DCL  1  AI, 

2   F(10)  BIN  FIXED, 
2   NAME  CHAR (8), 
then  we  can  associate  this  structure  with  all  the  nodes  of  the  graph  in 
2.10(b)  with  the  following  SOL  statement: 
NDASOC  (NOF (SCENE))   1  AI , 

2   F(10)  BIN  FIXED, 
2   NAME  CHAR (8); 
now  we  have  ten  shape  attribute  features  and  one  eight  character  string 
associated  with  every  node  in  graph  2.10(b).   Attributes  can  be  used  both 
as  functions  and  as  pseudo  variables.   For  example,  if  the  following  struc- 
ture V  has  the  values  to  be  attributed  to  node  N3  of  graph  "scene", 

1  V, 

2  ARRAY  (10)  BIN  FIXED, 
2  STRING  CHAR (8); 

then  it  can  be  assigned  with  the  following  statement: 

AI  (SCENE.  N3)  =  V, 
and  the  values  of  the  attributes  can  be  retrieved  in  the  same  manner 

V  =  AI (SCENE. N3). 
For  detailed  description  of  the  language  see  Appendix  A.   Pointers  are 
extensively  used  when  the  name  references  are  ambiguous. 


41 
3.   GRAPH  TRANSFORMATIONS 


In  chapter  2  we  showed  how  scenes  can  be  represented  in  a  natural 
way  through  graphical  description.   Inferring  from  a  conceptual  model  of  the 
universe,  graph  transformations  are  applied  to  the  scene  graphs  to  transfer 
them  to  more  readily  interpretable  structures.   Transformations  also  can 
act  on  the  model  objects  of  the  universe  to  produce  variations  of  these 
models , 

Graph  transformations  are  formally  defined  in  the  same  manner  as  in  [l7] 
by  Schwebel:   A  graph  transformation,  T,  between  two  graphs  g  and  g   is  a 
relation  TcgjxgJ  where  g'  =  g\J\.      Here  we  assume  an  element,  X,    called  the 
empty  element,  which  is  associated  with  each  graph,  g.   An  element  of  g 
mapped  into  ^  only,  is  said  to  be  deleted  by  the  transformation  and  an 
element  of  g2  which  is  the  image  of  X     only  is  said  to  be  created  by  the 
transformation. 

The  inverse  of  a  graph-structure  transformation  T  is  denoted  T_1  and  is 
the  inverse  of  the  relation.   That  is, 
T_1  =  {(y,x)|(x,y)  €  T]  . 

Note:   Delete  is  the  inverse  of  create. 

In  general  in  our  picture  processing  application  we  are  interested  in 
the  contracting  transformations—many  to  one--i.e.,  |g  |  >  |g  |  . 

3.1   Domain  and  Range  of  Transformations 

Transformations  act  on  a  graph  (nodes  and  branches),  and  produce  a 
graph  (nodes  and  branches).   Because  we  wish  to  allow  contextual  elements 
to  be  included  in  the  domain  or  range  of  the  transformation,  the  relation 
need  not  be  everywhere-defined  nor  onto.   In  fact,  those  parts  of  the  graph 
which  are  not  affected  by  transformation  remain  intact  as  parts  of  the  newly 
formed  graph. 


42 


Fig.  3.1(a)  is  an  example  of  a  scene;  (b)  is  its  graphical  description; 
(c)  defines  a  transformation  T  which  maps  nodes  and  branches  into  a  branch, 
and  nodes  and  branches  into  a  node ;  (d)  defines  another  transformation,  T^ 
which  maps  branches  into  a  branch,  and  nodes  and  branches  into  a  node. 

The  empty  element,  X,  is  introduced  to  specify  creations  and  deletions. 
We  could  have  defined  unmapped  elements  to  be  deleted,  but  that  would  have 
eliminated  the  possibility  of  having  elements  in  the  graph  which  are  present 
only  to  establish  context  and  which  are  not  mapped  or  deleted. 
Definition:   domain  of  a  transformation  is  that  part  of  the  graph  which  is 
mapped  or  deleted.   Range  of  transformation  is  that  part  of  the  newly  formed 
graph  which  the  elements  of  the  domain  are  mapped  into. 

The  element  mapping  and  the  presence  of  the  empty  element  in  a  trans- 
formation also  allow  us  to  define  the  memory  of  an  element  in  the  range  of 
a  transformation.   This  is  a  useful  concept  to  have  when  considering  condi- 
tions for  performing  the  inverse  transformation  and  to  simplify  the  representa- 
tion of  a  series  of  graph  transformations.   The  memory,  denoted  M(e),  of  an 
element  e  which  is  in  the  range  of  the  transformation  T.  is  the  inverse  image 
of  e  under  T.,  that  is,  M(e)  =  T._1(e).   Normally  in  a  series  of  transforma- 
tions  T  will  be  the  last  transformation  which  had  e  in  its  range. 
'   J 
Then  after  a  transformation,  TCgJxgJ  to  a  graph  G^CG),  we  have 

available  the  transformed  graph  G'^CG'),  which  contains  the  range  of  T, 
and  the  history  of  the  transformation,  represented  by  the  memory  of  the 
elements  of  §2 :   M(e)  for  e  €  ^ ,  (e  +   ^   if  e  is  a  node).   This  simplifies 
the  representation  of  g^    which  will  then  be  the  domain  for  further 

transformations . 

One  problem  which  arises  in  a  contextual  transformation  is  the  fate  of 
relations  between  the  nodes  in  the  domain  and  other  nodes  not  in  the  domain. 


43 


R2 


I 


I 


Rl 


R4 


R5 


(a)      scene 


1 


1 


i 


R3 


(b)   graphical  representation 


Rl 


(c)   (node, branch)  ■■»  branch 
(node , branch)  >  node 
transformation  T 

T  (Rl)  =  Rl 

T  (R5)  =.R5 

TL {  (Rl , DA BOVE , R2 ) ,  (Rl , DABOVE ,  R3 ) , 
(R2, DLOF, R4),(R4, DLOF, R3), 
(R2, DABOVE ,R5), (R3 , DABOVE, R5) , 
R2,R4,R3J  =    (R1,DAB0VE,R5) 


> 
o 
on 
< 


R5 


(d)   (node, branch)  -*  node 
(branch)  -♦  branch 
transformation  T 

T2(R1)  =  Rl 
T2(R5)  =  R5 
T2  (  (Rl, DA BOVE, R2),(R1, DABOVE, R3)) 

=  (Rl, DABOVE, R') 
T2(R2,DABOVE,R5),  (R3  ,  DABOVE, R5)  ) 

=  (R1 , DABOVE, R5) 
T2((R2, DLOF, R4),(R4, DLOF, R3), 

R2,R4,R3)  =  R» 


Fig.  3.1.   Examples  of  graph  transformations. 


44 


Associated  with  each  transformation,  T,  there  must  be  another  transformation 
which  is  a  function  of  these  branches  as  well  as  T,  and  maps  these  branches 

to  another  set  of  branches  and  the  empty  element  X.   Let  us  call  this  an 

embedding  transformation  ET. 

In  Fig.  3.2,  branch  (a^A.a^  is  mapped  into  branch  (a^A,^)  and 

branch  (a^B.a^  is  mapped  into  branch  (a^  X^);  in  other  words,  deleted. 

The  graphs  g]_  and  §2  are  encircled  and  are  referred  to  as  the  domain  sphere 

and  the  range  (or  transformed)  sphere,  respectively,  of  T.   The  memory  of 

n  ,  fa2,a3,(a2,C,a3)}  is  shown  cross-hatched. 

3 .2   Parsing  vs.  Transformation 

Let  us  impose  the  following  restrictions  on  the  transformations. 

1.  Branches  and  nodes  are  mapped  into  nodes,  and  for  every  branch  mapped 
into  a  node,  its  end  nodes  must  also  map  into  the  same  node. 

2.  Only  branches  are  mapped  into  a  branch. 

3.  The  mapping  is  a  functional  mapping,  i.e.  many-into-one  and  not  one^nto-many. 
Theorem  1:   With  the  above  restrictions,  we  can  represent  every  transformation 
as  a  series  of  transformations  with  the  following  characteristics: 

a)  range  of  each  transformation  is  a  single  node,  so  that  every  node 
in  the  range  of  T  defines  a  transformation. 

b)  for  each  of  the  above  transformations  the  embedding  branches  are 
defined  the  same  way  as  in  T  or  as  in  its  embedding  function  ET, 
whichever  may  be  the  case. 

proof:   The  proof  is  straightforward.   Assume  G.  =  (N.,B.)  and 

VBIlUBi2UBi3UBi4 

B      :      set   of   branches    cutting   the   domain   of   T 
il ' 

B      •      set    of   branches   mapping   into   a   node 
i2 


45 


<3i 


Fig.  3.2.   Illustration  of  transformation  domain  and 
range  and  memory. 


46 


B   :   set  of  branches  mapping  into  a  branch 
i3 ' 

Bi4=Bi  "  {Bil'Bi2>Bi33  ' 

Let  m  be  the  number  of  nodes  in  the  range  of  transformation  T,  then  we  show 

that, 

12    m 


where  T  is  in  the  class  of  transformations  defined  above, 
i 

Each  T.  is  a  mapping  function,  which  maps  the  elements  from  the  domain 
of  T  to  the  node  corresponding  to  T  .   We  define  the  embedding  function  of 


T.  as  follows 
1 


for  each  branch  cutting  the  domain  of  T\  either, 

b  ^  b., 

xl 

then  ET.  (b)  =  ET(b)     ET.  :   embedding  function  for  1^, 
or    b  €  Bi3 
in  which  case 

ET  (b)  =  T(b). 
In  all  other  cases  we  do  an  identity  embedding. 

Identity  embedding:  If  b  =  (a.,u,a.)  or  b  =  (a.,u,a.),  where  a.  belongs  to 
the  domain  of  T  ,  and  a.  is  outside  of  its  domain  then  IE(b)  =  (n-.u.a.)  or 
(a  ,u,n  )  correspondingly,   n,  is  the  range  (node)  of  T.^ 

Fig.  3.3  shows  the  split  of  transformation  T  in  Fig.  3.2  into  two 
transformations  T  and  T_ • 

With  the  above  three  simple  restrictions  on  transformations  and  theorem 
1,  we  have  shown  that  our  class  of  transformations  which  have  a  single  node 
range  are  quite  general,  and  this  provides  us  with  a  built-in  hierarchy, 
which  facilitates  the  inverse   transformations  (T_1)  at  single  node  level. 


47 


G: 


G, 


Go  =G' 


Fig.  3.3.   Implementing  transformation  T  into 
steps  T  and  T  . 


48 

3 .3   Validity  of  Embedding  Relations 

An  application  of  a  transformation  is  valid  if  the  resulting  graph  is 
valid  as  a  representation  of  a  physical  situation.   This  validity  will 
depend  upon  the  logical  definitions  of  the  system.   For  example,  if  composites 
represent  unions  of  set  elements,  and  branches  represent  binary  relations 
between  elements,  then  the  validity  of  the  transformation  depends  upon  the 
validity  of  the  relations  created  by  the  transformation.   We  can  make  a 
transformation  valid  by  defining  a  proper  embedding  function  for  the 
transformation. 

T  is  reversible  if  and  only  if  T  is  valid  and  T   is  valid,  given  the 
memory  of  T.   A  reversible  transformation  is  said  to  be  information  lossless. 
A  reversible  transformation  is  one  which  can  be  validly  inverted  assuming  the 
memory  of  the  transformation  is  available. 

One  important  criterion  in  selecting  our  class  of  transformations  is 
the  fact  that  the  memory  of  a  transformation  can  be  saved  entirely  in  the 
single  node  of  the  range.   If  our  embedding  function  dictates  that  an  embed- 
ding relation  can  not  exist,  we  will  create  a  temporary  branch  which  its 
memory  is  the  branch  which  had  to  be  deleted.   This  gives  a  full  reversi- 
bility to  the  class  of  transformations  which  we  defined  in  3.2. 

In  parsing  graphs,  a  back-up  procedure,  which  restores  the  environment 
exactly  as  it  was  before  the  transformation  was  applied,  is  an  absolutely 

necessary  part  of  the  system. 

The  values  of  embedding  function  depends  on  the  value  of  the  variables 
in  a  transformation  (nodes  and  branches  in  the  domain  and  their  attributes; 
branches  cutting  the  domain  of  transformation  and  their  attributes).   Fig. 
3.4(a)  shows  that  for  transformation  T,  binary  relation   "contain"  is 


49 


(a) 


(b) 


CONTAIN  :     CONTAINS 
DABOVE  :    DIRECTLY     ABOVE 


Fig.    3.4.      An   example    of   embedding   branch   dependinj 
on    the    variables    of   domain. 


50 

preserved,  while  "DABOVE"  had  to  be  deleted.  In  3.4(b)  the  same  "DABOVE1' 
relation  is  preserved,  because  now  we  have  branch  (1, DABOVE, 3)  instead  of 
branch  (1, DABOVE, 2)  in  (a). 

In  [17]  Schwebel  has  considered  general  rules  for  embedding  a  graph 
g  ,  which  has  resulted  from  a  transformation  TcgjxgAj  into  the  §raPh  of  the 
domain  g  .   He  has  considered  rules  as  properties  of  relations  for  arbitrary 
transformations,  which  we  know  is  not  generally  true  for  most  common  relations 
and  has  to  be  defined  in  connection  with  transformation  and  the  branch  it- 
self ,  not  only  branch  labels.   His  embedding  rules  determine  only  whether 
a  branch  (n,u,m),  where  n  is  outside  of  and  m  is  within  the  domain  sphere, 
implies  a  new  branch  (n,u,t)  where  t  is  the  range  node  and  inside  trans- 
formed sphere.   The  decision  as  to  whether  a  branch  exists  depends  upon  the 
inverse  image,  T~  (t),  of  the  g„  and  its  context.   He  defines  the  three 
following  embedding  types.   Assume  n,m  £  N,  n  f   g  ,  m  €  g   u  €  U,  t  €  T(m) , 
t  ^  X  (in  our  class  of  transformations  t  is  the  only  element  of  g^) . 
Simple  Embedding  Type 

OR   (n,u,m)   For  any  m  €  T   (t)  =»  (n,u,t) 

AND  (n,u,m)   For  all  m  €  T   (t)  =»  (n,u,t) 

NOT  (n,u,m)   For  no  m  €  T   (t)   =»  (n,u,t)  . 
Thus,  embedding  types  allow  us  to  determine  connections  from  an  element  out- 
side the  domain  sphere  to  the  elements  within  the  transformed  sphere  as 
logical  functions  of  the  elements  of  the  inverse  image  of  the  new  element 
and  their  context. 

Let  NODES  (T_1(t))  =  {m-,^,  ■  •  •  ,0^3  and  b  =  (n,u,t),  b±   =  (n^nu),  as 
shown  in  Fig.  3.5.   Let  f.  be  a  predicate  which  is  true  if  and  only  if  branch 
b.  exists;   similarly,  f  is  a  predicate  which  is  true  if  and  only  if  the 
branch  exists.   Then  the  embedding  types,  OR,  AND,  and  NOT  are  expressed  in 


51 


—  ----J? 


G: 


9i 


Fig.  3.5.   Simple  embedding  types. 


52 


terms  of  predicates  as  given  below, 


OR   V  f.  =>  f 
1 

1 


AND  A  f.  =>  f 
i 


NOT  — I  (V  f .  )  =>  £ 
i 


But  as  we  mentioned  before,  these  are  defined  as  the  property  of  rela- 
tions and  very  few  relations  have  this  property.   The  best  way  to  investigate 
about  this  is  to  define  embedding  functions  with  a  set  of  actual  relations. 

3.4   One  Class  of  Relations 

In  Table  3.1  we  have  shown  a  set  of  relations  which  are  very  useful  in 
picture  processing.   This  set  of  relations  are  closed  under  inversion 
i.e.  if  relation  r  is  in  the  set,  so  is  r   . 

Table  3.2  shows  some  of  the  basic  properties  of  these  relations.   In 
[22],  B.  H.  McCormick  has  made  an  extensive  representation  of  properties 
for  this  type  of  relations. 

3.4.1  Embedding  function  for  this  class  of  relation 

As  we  have  learned,  the  value  of  embedding  function  not  only  depends  on 
the  branches  which  cut  into  the  domain  of  transformation,  but  also  depends 
on  the  variables  of  transformation.   Here,  we  study  the  case  where  embedding 
function  depends  on  the  properties  of  the  binary  relations  in  the  domain  and 
the  ones  which  cut  the  boundary  of  the  domain. 

We  use  the  same  nomenclature  as  in  Fig.  3.5. 

NEXT:   node  n  must  be  in  relation  "NEXT"  to  a  node  in  the 

region  which  is  not  inside  any  other  node  (object)  of 
the  domain; 


53 


Table  3.1 
A  Class  of  Relations 


Relation 

Explanation 

NEXT 

1 

tail 

node  object  and  head  node  object  are  touching  in  more 

than 

one  point 

(NEXT)"   =  NEXT 

CONTAIN 

2 

tail 

node  object  contains  head  node  object 

INSIDE 


LOF 


RO'F 


ABOVE 


BELOW 


INB 


INF 


DLOF 


DROF 


10 


11 


(CONTAIN)    =  INSIDE 

tail  node  object  is  inside  the  head  node  object 

(INSIDE)"1  =  CONTAIN 

tail  node  object  is  located  at  the  left  of  head  node  object 
and  not  touching 

(LOF)"1  =  ROF 

tail  node  object  is  located  at  the  right  of  head  node 
object  and  not  touching 

(ROF)-1  =  LOF 

tail  node  object  is  located  above  head  node  object  and  not 
touching 

(ABOVE)"   =  BELOW 

tail  node  object  is  below  head  node  object  and  not 
touching 

(BELOW)"   =  ABOVE 

tail  node  object  is  located  in  the  back  of  head  node 
object  and  not  touching 

(INB)"1  =  INF 

tail  node  object  is  located  in  front  of  head  node  object 
and  not  touching 

(INF)"   =  INB 

tail  node  object  is  located  at  the  left  of  head  node 
object  and  touching 

(DLOF)"   =  DROF 

tail  node  object  is  located  at  the  right  of  head  node 
object  and  touching 


(DROF)    =  DLOF 


54 


Table  3.1  (continued) 


DABOVE 


DBELOW 


DINB 


DINF 


12 


13 


14 


15 


tail  node  object  is  above  head  node  object  and 
touching  i 

(DABOVE)    =  DBELOW 

tail  node  object  is  below  head  node  object  and 
touching  i 

(DBELOW)"   =  DABOVE 

tail  node  object  is  in  the  back  of  head  node  object 
and  touching 

(DINB)"1  =  DINF 

tail  node  object  is  in  front  of  head  node  object  and 

touching  _i 

(DINF)    =  DINB 


Table  3.2 
Few  Properties  of  the  Relations 


55 


Relation 

Symmetric 

Asymmetric 

Trans . 

Intrans . 

NEXT 

1 

0 

0 

1 

CONTAIN 

0 

1 

1 

0 

INSIDE 

0 

1 

0 

LOF 

0 

1 

0 

ROF 

0 

1 

0 

ABOVE 

0 

0 

BELOW 

0 

0 

INB 

0 

0 

INF 

0 

0 

DLOF 

0 

0 

1 

DROF 

0 

0 

1 

DABOVE 

0 

0 

1 

DBELOW 

0 

0 

1 

DINB 

0 

0 

1 

DINF 

0 

0 

1 

56 


i.e. 


(n,NEXT,m.)  £  G  where  m.  €  g  and   V  m .  €  g    j  ±   i 

(m.  ,INSIDE,m.)  f   g, 

(m., CONTAIN, m.  )  f  g±    . 

CONTAIN:   This  relation  is  an  AND  embedding  type.   Since  this  relation 

is  transitive,  not  all  the  branches  need  to  be  present; 

i.e.      V  m.  in  g 

(n, CONTAIN, m. )  £  G|(m. ,  CONTAIN, m.  )  €  g  and   (n,  CONTAIN, m  .  )  6  G 

or         (m. , INSIDE, n)  6  G  |  (m. , INSIDE, m . )  €  g    and   (m  ., INSIDE, n)  €  G 

j  +   i  m  £g1 
We  assume  that  every  time  a  non-existing  branch  is  implied  by  transitivity, 

it  is  created. 

INSIDE:   This  relation  is  an  OR  embedding  type; 

i.e.       (n, INSIDE, m. )  £  G   where  m.  €  g 

is  necessary  and  sufficient  condition. 

DABOVE  :ABOVE :   The  node  in  question  must  be  in  "DABOVE'V'ABOVE"  relation  with 

a  node  inside  the  domain  which  is  not  contained  in  any  node  (object)  of  the 

domain.   (Remember  that  neighborhood  relations  are  defined  only  between 

brother  nodes  in  a  containment  tree.)   I.e. 

(n, DABOVE, m. )  €  G    where    m.  6  g   and  V  m  €  g     j  ±   i 

(m. , INSIDE, m.) | (m . , CONTAIN, m  )  f   g 
i         J     J 

(n, ABOVE, m.)  €  G     where    nu  €  g;L  and  V  m  €  gL     j  ^  i 

(m. , INSIDE, m.)  and  (m ., CONTAIN, m. )  £  gL  . 
DBELOW : BELOW :   The  node  in  question  must  be  in  "DBELOW"/ "BELOW"  relation  with 
a  node  inside  the  domain  which  is  not  contained  in  any  node  (object)  of  the 
domain;  i.e. 


57 


(n,DBELOW,m  )  €  G   where   m.  €  g   and  V  m.  €  g,     i  ^  i 
i  1    1  j     1     J 

(n^,  INSIDE,  m.)  |  (m., CONTAIN, m.)  f.   g 

(n,  BELOW,  nu)   ^G   where   m.  , INSIDE, m  .  )  |  (m  . , CONTAIN, m.  )  £g   . 

DLOFrLOF:   The  node  in  question  must  be  in  relation  "DLOF'V'LOF"  to  a  node 

in  the  domain  which  is  not  contained  in  any  node  (object)  of  the  domain;  i.e 

(n,DLOF,m.)  €  G     where    m.  €  g.    and    V  m   €  g      i  d   ± 
i  l    °1  i    51     J  ^ 

(nu,  INSIDE,  m.)  |  (m.,  CONTAIN,  m.  )  f   g 
(njLOFjia^^)   €  G     where   m  €  g   V  m.  €  g  j  ^  i 

(nu, INSIDE, m.)  |  (m  . ,  CONTAIN,  m.  )  f   g   . 

DROF:ROF:   The  node  in  question  must  be  in  relation  "DROF'V'ROF"  to  a  node 

in  the  domain  which  is  not  contained  in  any  node  (object)  of  the  domain;  i.e. 

(n,DROF,m  )  €  G     where    m.  6  g.   and   V  m   fc  g      \    4   i 
i  l    °1  i    5i     J  r 

(m  ,  INSIDE,  m.)  |  (m  . ,  CONTAIN,  m.  )  f.   g 

(n,ROF,mi)  €  G      where   m.  €  g   and   V  m.  €  g     j  ^  i 

(nu,  INSIDE,  m.)  |  (m  .  ,  CONTAIN,  m.  )  fg      . 

DINB:INB:   The  node  in  question  must  be  in  relation  "DINB'V'INB"  to  a  node 

in  the  domain  which  is  not  contained  in  any  node  (object)  of  the  domain. 

(n,DINB,mi)  6  G     where    m.  €  g    and   V  m.  €  g      j  ^  i 

(n^, INSIDE, m.)  |  (m.  ,  CONTAIN,  m.  )  f.   g 

(n,INB,m  )  €  G      where   m.  €  g1   and   V  m  €  g      i  ±   ± 
i  l    °1  i     1     J 

(mi, INSIDE, m.) | (m., CONTAIN, m. )  f    g 
DINF:INF:   The  node  in  question  must  be  in  relation  ,IDINF,I/"INF"  to  a  node 
in  the  domain  which  is  not  contained  in  any  node  (object)  of  the  domain. 
(n,DINF,m.)  €  G     where   m 6  g   and  V  m.  €  g     j  ^  i 

(m±,  INSIDE,  m.)  |  (m  .  ,  CONTAIN,  m.  )  f   g 
(n,INF,m1)  6  G      where   mi  €  g   and  V  m.  €  g     j  ^  i 

(mi, INSIDE, m.)  |  (m  .  ,  CONTAIN,  m.  )  f   g 


58 


3.4.2   Other  useful  operations  on  this  set  of  relations 
There  are  some  other  useful  operations  on  the  set  S  of  relations,  i.e. 
T.CSxS,  where  the  set  is  closed  under  these  operations.   Tables  3.3  (a)-(d) 
defines  three  of  these  operations  which  physically  correspond  to  rotating 
the  scene  90°,  180°,  270°,  respectively.   As  we  see  from  Table  3.3(a),  for 
each  relation  only  one  value  is  defined  so  ^  is  rather  a  function  which 
maps  the  elements  of  S  to  some  other  elements  of  S  (single  value),  so  we  have 
chosen  functional  representation  for  ^   and  T3 .   It  is  easy  also  to  see  the 
following  properties. 


Vl   =   T2 

V2   =   T2T1   =   T3 

TlVl   =   T3    ' 


We  make  use  of  these  operations  to  define  certain  transformations  on 
general  models  to  obtain  different  projection  of  the  objects  they  represent. 


Table  3.3(a) 


T   (90  rotation) 


59 


NEXT 

CONTAIN 

INSIDE 

LOF 

ROF 

ABOVE 

BELOW 

INB 

INF 

DLOF 

DROF 

DABOVE 

D BE LOW 

DINB 

DINF 


s 
s 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

1 

X 

2 

X 

3 

X 

4 

X 

5 

X 

6 

X 

7 

X 

8 

X 

9 

X 

10 

X 

11 

X 

12 

X 

13 

X 

14 

X 

15 

X 

Table  3.3(b) 

o  . 

Function  T   (Rotation  90  ) 


s 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

TX(S) 

1 

2 

3 

8 

9 

6 

7 

5 

4 

14 

15 

12 

13 

11 

10 

Table  3.3(c) 


60 


Function  T   (Rotation  180  ) 


s 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

T2(S) 

1 

2 

3 

5 

4 

6 

7 

9 

8 

11 

10 

12 

13 

15 

14 

Table  3.3(d) 


Function  T   (Rotation  270  ) 


s 

1 

2 

3 

T3(S) 

1 

2 

3 

4 

5 

6 

7 

9 

8 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

4 

5 

15 

14 

12 

13 

10 

11 

61 
4.   MODELS  AND  PARSING 

Communication  between  two  different  media  occurs  only  through  an 
intermediate  medium  interpretable  by  both.   Meaningful  oral  conversation  can 
occur  between  two  human  beings  if  each  knows  the  language  used  by  the 
other  party.   Otherwise  through  a  translator  the  information  is  translated 
to  an  acceptable  form  to  the  listener.   Written  languages  are  only  understood 
by  the  individuals  who  are  able  to  transform  it  to  internal  forms  of  know- 
ledge.  An  illiterate  man  will  more  or  less  receive  the  same  information  if 
a  written  paragraph  is  read  to  him,  provided  he  is  knowledgeable  about  the 
written  fact.   This  suggests  that  the  internal  form  of  knowledge  is  more 
or  less  independent  of  the  methods  used  in  tapping  this  knowledge. 

Of  course  we  are  not  in  any  way  suggesting  that  the  growth  of  this 
knowledge  itself  is  independent  of  the  accessing  methods  used.   On  the 
contrary,  we  have  learned  from  our  experience  with  computers  that  the 
availability  of  higher  level  languages  has  made  the  formulation  of  problems 
otherwise  extremely  difficult,  quite  feasible. 

Pictures  are  a  universal  means  of  conveying  messages,  since  they  are 
learned  from  natural  environments  which  are  more  or  less  common  to  all  human 
beings.   A  born  infant  before  developing  any  linguistic  capability  is  able 
to  perform  simple  pattern  recognition  activities.   The  Chinese  still  use 
ideographs  to  convey  messages  (|i|  =  river),  and  many  natural  languages 
retain  ideographic  elements  even  today. 

The  argument  here  is  that  knowledge  should  be  represented  in  a  hier- 
archical structure  which  can  be  easily  modified  and  expanded.   This  structure 
should  handle  conceptual  abstraction  at  all  the  levels,  independent  of 
irrelevant  details. 


62 


A  human  statement  about  a  fact  is  a  collection  of  sentences  which  hope- 
fully convey  the  intended  meaning  to  the  receiver.   For  example, 
A  boy  is  standing  on  the  ground. 
A  ball  is  lying  on  the  ground. 
The  boy  is  playing  with  the  ball. 
Here  the  objects  can  be  either  defined  in  context  (their  associations  with 
other  objects  at  the  same  level  of  abstraction)  or  in  terms  of  their  con- 
stituents (through  the  hierarchical  structure  of  the  knowledge).   Graphical 
models  are  a  valid  approach  in  representing  the  knowledge.   The  facts  relayed 
by  the  sentences  in  this  example  can  be  represented  as  in  Fig.  4.1.   The 
analysis  is  carried  out  by  proving  the  theorem  (goal)  stated  by  each  branch 
of  this  graph,  using  the  body  of  knowledge  present  in  the  intelligent  memory. 
As  any  other  program  in  a  conventional  programming  language,  scenes  and 
pictures  can  be  thought  of  as  a  collection  of  sentences,  where  each  sentence 
describes  an  object.   Relations  between  parts  of  different  sentences  can  be 
translated  into  relations  between  the  objects  themselves.   We  also  know  that 
the  pictures  are  context  sensitive.   For  example,  finding  a  human  head  and 
body  in  the  scene  does  not  necessarily  mean  that  we  have  discovered  a  man, 
unless  the  head  is  connected  to  the  body. 

Even  in  string  grammars  where  we  have  only  simple  adjacency  relationship 
between  the  elements  of  the  language.   Grammars  like  Fig.  4.2(a)  are  context 
sensitive.   (b)  is  a  sentence  in  this  language  and  (c)  is  the  parsing  sequence 
for  this  sentence.   Rules  like  aB  -  ab  can  be  applied  only  when  »B«  is  pre- 
ceded by  "a",  which  establishes  the  required  context  for  this  replacement. 
In  picture  processing  we  are  faced  with  several  relationships  between  the 
picture  primitives  which  make  the  contextual  representation  far  more  diffi- 
cult than  a  simple  adjacency  relation.   Relational  graphs  are  an  excellent 
method  to  represent  this  complex  contextual  information. 


63 


Fig.  4.1.   Graphical  representation  of  a  collection  of  English 
sentences. 


g:  S  ->  aSBC|aBC  bB  -*  bb 

CB  -*   BC  bC  -»  be 

aB  -*  ab  cC  ->  cc 

(a)   grammar 


(b)   aabbec 


s  -»•  aSBC  - 
aaBCBC  ►  aabCBC  ♦ 
aabBCC  -  aabbCC  -» 
aabbcC  -»  aabbec 

(c)   parsing  of  the  sentence  (b) 


Fig.  4.2.   A  context  sensitive  language  and  parsing. 


64 

With  relational-graph  representation  of  the  rules  in  a  grammar,  we  have 
to  depart  from  conventional  parsing  methods  and  use  graph  transformational 
techniques.   In  Fig.  4.3  we  have  shown  how  a  relational-graph  can  be  used 
to  represent  the  structure  (or  context)  of  a  table. 

4. 1   Graph  Structure  Definitions 

Here  we  define  a  graph  structure,  which  corresponds  to  a  production 

system  in  conventional  string  grammars.   Let 

N  =  {n..  ,n~  ,  •  •  •  ,n  ,  Xj  be  the  set  of  all  nodes  in  the  graph 
1  I  m 

structure, 

where  X  is  a  null  node. 

There  is  a  partial  ordering  between  the  nodes  of  a  graph  structure. 

In  this  partial  ordering  each  node  is  immediately  above  all  the  nodes  which 

participate  in  its  structural  definition. 

If  n.  is  immediately  above  n.  (in  other  words  n.  >  n. )  then,  n.  is 
J  i  J    i         i 

said  to  be  an  immediate  predecessor  of  n.  in  the  partial  ordering. 

Definition:   A  primitive  node  is  defined  as  a  node  with  an  empty  list  of 

predecessors.   A.  is  a  primitive  node.   All  of  the  nonprimitive  nodes  are 

called  super  nodes . 

Imposing  the  following  restrictions  on  the  partial  ordering  of  nodes, 

and  introducing  structural  bonds  by  labeled  branches,  we  can  define  the 

graphs  of  our  system  as  follows:   Let 

N.    be  the  set  of  nodes  immediately  below  n. , 

N.    be  the  set  of  nodes  immediately  below  n., 
J  J 

then  we  impose  that 


either     N.flN.  =  null 


or        N.  =  N.    where    N.flN.  =  N.  =  N. 
i     J  i    J     i     J 


65 


5         <-« 


J       «-J 


»       ' — '     > 


(a)      parts    of   an   object 


=       A     TABLE 


(b)   structural  representation  of  the  object 


Fig.  4.3.   Context  sensitivity  of  pictures. 


66 

The  above  equivalence  relation  will  divide  the  set  of  nodes  to  the 
equivalence  classes.   Each  class  is  specified  with  a  set  of  nodes  which  are 
immediate  predecessors  to  all  the  nodes  in  this  class.   We  attribute  a  graph 
g  to  each  class  as  follows: 
gx=(N.,B.)5 

where  N  €  N  and  it  is  the  set  of  predecessors  to  this  class; 

i 

B  £  B  where  for  every  branch 
i 

b.  =  (n.,u  ,n.)   (u   6  U) , 
k     1'  p'  J     P 

if  b   £  B    then  n..n.  €  N. .   Here, 
k    1        i   J     i 

B  =  fb  b   •  •  •  ,-b  }   set  of  all  the  branches  in  the  structure, 

U  =  {u  ,u  ,---,u  }   set  of  labels  for  branches, 
12       k 

so,        B  =  NxUxN. 
Now  assume  that 

G  =  f.g  ,g  ,*",g  )   set  of  aU  the  §raPhs  in  the  structure. 
Among  these  graphs  besides  the  set  defined  above  there  are  the  graphs 
which  are  formed  by  the  nodes  which  do  not  have  any  successor  in  the  partial 
ordering  of  the  nodes.   We  call  this  set  of  graphs  as  FGS  (final  graph  set ) . 
From  the  above  definitions  it  is  clear  that  each  super  node  has  a  graph 
associated  with  it.   The  set  of  graphs  form  a  partial  ordering  as  follows: 
Graph  partial  ordering:   A  graph  g.  is  immediately  above  g .  if  and  only  if 
a      is  associated  with  one  of  its  super  nodes.   We  call  this  partial  ordering  a. 
With  this  preparation  we  now  define  our  graph  structure  GS  as  follows; 
GS  =  (N,B,G,a|u,A). 

Here  again, 

N  =  fn.,  ,n0,- ••  ,n  ,\}    set  of  all  nodes. 
12      m 


67 

n  is  a  super  node,  if  it  has  a  graph  associated  with  it. 
ru  is  a  primitive  node  if  it  has  no  graph  association. 
X  is  a  null  node. 

U  =  {u1,u2, • • • ,u  ]     set  of  labels  for  branches 
B  =  [b-,b  , •■•,b  }    set  of  branches  in  the  structure 
where 

b.  =  (n.  ,  u,,n.)   and   u  0   6  U,   n.,n.  €  N 
1     x      *j     2  *>  i   J 

so, 

B  C  NxUxN. 

G  =  lg-L,g2>  '  '  '  >8q  ■*     set  of  a11  the  graphs  in  the  structure  (FGS  and 

the  set  associated  with  the  super  nodes), 
o   is  a  partial  ordering  between  graphs, 
A  =  NA  UBA  UGA 

=     1 ,a? ' " ' ' 'Sr 

is  a  set  of  attributes  for  nodes,  branches  and  graphs.   Each  a.  is  a  function 
such  that, 

a.  :   {  N  I  B  I G  }  -v. 

i  l 

where  v^^  is  a  set  of  values.   For  example,  a.(g  -nodel)   =  g.  is  interpreted 

as:   graph  g   is  associated  with  the  node  "nodel  of  graph  g  ".   Equivalently , 

we  may  consider  the  branches  as  a  set  of  relations  {Ru, ,Ru0 , • • • ,Ru  }  =  JR  } 

12       r      u 

where , 

R  CNxN, 
u—    ' 

and  a  branch  b   =  (n .  ,  u  ,n.)  exist  if  and  only  if  (n.,nfl)  6  Ru,  . 

By  having  the  graphs  to  form  a  partially  ordered  set,  not  only  we  can 
economize  the  space  used  for  modeling  our  universe,  but  also  we  have  an 
implied  hierarchy  built  into  the  system,  which  assists  us  in  search  of 
domains  to  apply  transformations. 


68 


Once  again  to  put  everything  in  prospective,  in  our  picture  analysis 
scheme,  a  graph  structure  constitutes  our  body  of  knowledge  about  the  uni- 
verse.  Nodes  represent  picture  primitives  or  higher  level  composite 
objects.   Labeled  branches  represent  binary  relations  between  parts  of 
objects  (syntactical  context).   Graphical  models  of  the  objects   represent 
the  local  domain  or  context  of  a  transformation.   As  we  have  seen  in  chapter 
3,  the  range  of  transformation  is  always  a  single  node.   The  attributes 
carry  most  of  the  semantical  information  of  any  particular  application. 
Along  with  the  transformation  there  are  semantical  procedures  which  decide 
the  combined  semantical  information  to  be  carried  to  the  formed  composites. 
An  example  would  be  helpful  to  understand  the  above  material. 

N  =  {n1,n2,n3,..-,n10}    set  of  nodes. 

n15n2,n3,n4,n5,n6,n9  are  primitive  nodes, 

n    is  a  successor  to  n  and  n  , 

n    is  a  successor  to  n,,  n^,  n^ ,  and  n^, 

n     is  a  successor  to  n^  and  n^ . 
The  partial  ordering  is  shown  in  Fig.  4.4(a). 

G  =  lg-i3§2'§3'  &L 

g   is  associated  with  super  node  ng. 


'2 


is  associated  with  super  nodes  n^  and  n^Q. 


e   e   £  FGS,  i.e.  the  Final  Graph  Set. 
63 '  °4 

U  =  [above, next]    set  of  branch  labels; 
introducing  the  branches,  each  graph  is  defined  as  follows: 

§1  -  (nL,n2, n3,n4,(ni, next, n2),(n2, above, n3),(n2, above, n4)}, 

g2  =  ^n5'n6'(tVneXt'n6)^ 
g3  =  [n7,ng,  (n?, above, ng)  }, 

g4  =  S>niO'(n9'neXt'n10)3' 


69 


(a)   partial  ordering  between  nod 


es 


Qz- 


n5 


13- 


nrj—g, 


ne) — g4 


g4:  (n.oK~ g, 


(b)   graphs  of  the  system 


g3  g. 


Qi  g2 

(c)   partial  ordering  a 


Fig.  4.4.   An  example  of  a  graph  structure, 


70 

and  a  is  the  partial  ordering  4.4(c)  of  the  graph  set  G. 

A  =  {GRAPH, • • • }■ 
The  node  attribute  "GRAPH"  is  used  to  associate  graphs  with  the  super  nodes. 

GRAPH(n8)  =  gL 

GRAPH(n?)  =  g2 

GRAPH (n1Q)  =  g2. 
Since  there  is  no  need  for  all  nodes  to  have  unique  names,  qualification  of 
node  references   can   be  necessary. 

§1.n2,  g2.n5,  etc. 
But,  all  the  nodes  in  the  same  graph  must  have  unique  names.   Here  we  give 
a  few  examples  of  SOL  functions: 

N0F(g1)  =  {n1,n2,n3,n43 

B0F(g2)  =  {(n5,next,n6)  } 

INCBR(g  .ng)  =  {  (ny, above, ng)  } 

0UTBR(g  .n2)  =  {(n2,above,n3),  (n2, above, n4)  } 

ADJBR(g  .n2)  =  {(n1,next,n2), (n2, above, n3), (n2, above, n4)  }  . 

4.2   Parsing  the  Graphical  Representation  of  the  Scene 

In  string  languages  parsing  is  accomplished  either  in  bottom-up  or  top- 
down  methods.   Top-down  parsing  is  goal  oriented,  where  we  set  up  goals  and 
try  to  reach  them  through  a  set  of  subgoals  in  the  parsing  tree.   When  a  set 
of  subgoals  fail,  we  try  an  alternative  set  of  subgoals,  and  if  all  the  sub- 
goals,  achieving  the  same  goal,  have  failed  we  have  to  set  up  a  new  goal  and 
proceed.   In  bottom-up  parsing  we  depend  on  local  evidence.   When  enough 
evidence,  which  satisfies  a  subgoal  phrase  is  at  hand,  we  will  parse  this 
evidence  as  that  subgoal  and  proceed.   If  an  accomplished  subgoal  fails  to 
produce  any  higher  level  goals,  then  it  is  an  erroneous  conclusion  and  we 


71 

have  to  backtrack.   Our  method  of  parsing  is  similar  to  the  bottom-up 
parsing,  but  here  instead  of  symbols,  graphs  are  pushed  into  our  parsing 
stack.   We  also  can  use  top-down  parsing  when  heuristically  it  is  beneficial 
to  do  so.   Before  discussing  our  parsing  method  we  make  the  following 
definitions.   As  we  have  mentioned  before,  each  super  node  in  our  graph 
structure  has  a  graph  associated  with  it. 

Definitions :   A  son-set  of  a  graph  is  a  set  of  graphs  which  are  associated 
with  its  super  nodes. 

A  father-set  of  a  graph  is  a  set  of  graphs  which  have  this 
graph  associated  with  one  of  their  super  nodes. 

Primitive -graph  set  (PGS)  is  the  set  of  graphs  in  which  all  of 
their  nodes  are  primitive. 

Now  we  define  our  parsing  graph  as  follows:   The  parsing  graph  is  a  graph 
which  has  the  same  number  of  nodes  as  the  number  of  graphs  in  our  graph  struc- 
ture.  Each  node  has  one  graph  associated  with  it.   A  set  of  weighed  branches 
leaves  each  node  which  arrives  at  nodes  whose  associated  graphs  are  the  sons 
to  the  graph  associated  with  this  node.   Since  there  is  a  one-to-one  corres- 
pondence between  the  nodes  in  this  graph,  and  the  graphs  in  the  graph 
structure,  we  use  them  interchangeably. 

There  are  several  advantages  in  representing  our  parsing  tree  in  a 
graphical  form 

(a)  The  dynamic  nature  of  a  graph  makes  modification  very  easy— which  is 
essential  in  any  system  with  learning  capability.  For  example,  addition 
and  deletion  of  graphical  rules  to  the  graph  structure  is  straightforward. 

(b)  Through  association  of  variables  with  branches  we  can  affect  the 
parsing  order. 

(c)  We  can  use  SOL  to  manipulate  this  graph. 


72 

Fig.  4.5  shows  the  structure  of  this  graph.   OUTBR(node)  specifies 
the  set  of  branches  whose  head  nodes  form  the  son-set  of  this  node.   INCBR 
(node)  specifies  the  set  of  branches  whose  tail  nodes  form  the  father-set 
of  this  node. 

As  we  have  seen  in  chapter  2,  scenes  are  represented  as  graphs  whose 
nodes  correspond  to  primitive  regions  of  the  pictures  and  branches  repre- 
sent binary  relations  between  these  regions.   Node  attributes  (shape,  color, 
texture,  size,  etc.)  carry  semantic    information  about  each  node. 

The  purpose  of  parsing  is  to  find  the  set  of  recognizable  objects  in 
the  scene  which  have  a  graphical  model  in  our  graph  structure  (universe). 
Fig.  4.6  shows  the  dynamic  parsing  tree  which  has  its  root  at  the  attention- 
point  (AP)  and  propagates  to  higher  level  objects.   Parsing  procedure  for 
each  object  is  complete  when  the  recognized  graph  belongs  to  final  graph 
set  (FGS)  or  it  is  a  meaningful  object  and  none  of  its  fathers  can  be  recog- 
nized.  When  all  the  fathers  of  a  recognized  rule  have  failed  and  this  rule 
does  not  represent  a  meaningful  object  in  our  universe,  the  inverse  trans- 
formation to  this  rule  is  applied  to  the  scene  and  will  restore  the  environ- 
ment to  the  preapplication  of  this  rule.   Now,  the  next  rule  in  the  ordered 
set  at  the  previous  level  will  be  tried. 

Fig.  4.7  shows  the  flow  chart  of  the  basic  recognizer.   Here,  we  have 
avoided  the  details  of  each  sub-function  which  are  described  in  the  follow- 
ing subsections.   There  are  also  other  easily  implementable  concepts  which 
we  will  introduce  later  on  in  this  chapter  and  in  chapter  6. 

a.  Best-Match  Concept:   This  enables  us  to  recognize  incomplete  or 
partially  hidden  objects  in  the  scene. 

b.  Scoring  Techniques:   This  will  enhance  the  performance  of  the  syster 
in  a  repetitious  environment. 


73 


Parsing   graph: 


—  FGS 


o 


PGS 


Fig.  4.5.   The  graphical  representation  of  parsing  graph 


-TERMINATING 
-  SUCCESSFUL 
Q  -  UNEXPLORED 


Fig.  4.6.   Dynamic  parsing  tree, 


74 


relation 
preprocessing 


pop  out  AP 

from  stack 

A  and  mark 

this  node 


Yes 


backtrack  from 

last  recognized 

rule 


pop  out  stack  A 
and  stack  B 


NEXTOF(P) 


nodes'  class 
assignment 


select  an  attention  point 

and  form  the  ordered  set 

of  entry  points  to  the 

parsing  graph 


No 


report  the 
result  of 
analysis 


-0 


push  AP  into 
the  stack  A 


initialize  P  to  the 
first  element  of  the  list 


Yes 


<•> 


search  the  domain  implied 
by  this  entry  point 


:  AA 


No 


P  =  NEXTOF(P) 


parse  (transform  subgraph 

domain),  and  take  care 

of  embedding  branches 


push  the  newly  formed  node 

(range  of  transformation) into 

stack  A  and  pointer  P 

into  stack  B 


initialize  P  to  the  first 

element  of  the  ordered  list 

of  fathers  1.0  this  graph 


Fig.  4.7.   Flow  chart  of  basic  recognizer 


75 


c   Heuristic  Techniques:   These  techniques  make  use  of  more  global 
information  to  enhance  the  performance  of  our  system. 

d.   Learning  Capabilities:   These  are  the  functions  which  enable  our 
system  to  add  or  combine  objects  in  the  universe  and  also  enhance  the  per- 
formance of  recognizer.   We  discuss  these  aspects  of  recognition  in  chapter 
6. 

Our  recognizer  functions  mainly  in  two  modes: 

1.  Learning  mode:   saves  all  the  useful  acquired  information  from 
this  experience  for  further  use  in  future. 

2.  Non-learning  mode:   The  extent  of  information  saving  is  negligible 
compared  to  learning  mode. 

4.2.1   Relation  preprocessing 

Exploiting  certain  properties  of  the  relations  used  in  representation 
of  our  universe,  we  can  substantially  reduce  the  number  of  relations  or 
facilitate  further  manipulation. 

The  class  of  relations  we  described  in  chapter  3  is  closed  under 
inversion.        We  can  describe  our  graph  structure  by  using  only  rela- 
tion r  instead  of  r  and  r"1.   In  the  graphical  input  of  the  scene  the  branches 
with  label  r"-  are  labeled  as  r  and  the  direction  of  the  branch  is  reversed. 
This  preserves  the  syntactic    and  semantic    implications  of  the  scene 
while  greatly  reducing  the  recognition  effort.   Other  properties  of  these 
relations  like  transitivity  can  be  taken  care  of  at  the  preprocessing  level 
(creating  all  implied  branches,   or  creating  these  as  they  are  needed  in  the 
course  of  action).   These  functions  are  performed  in  action  box  ^    of  Fig.4.7. 
4.2.2   Nodal   class  assignment 

Nodes  in  the  original  scene  are  picture  primitives  and  must  be  assigned 
to  one  or  more  primitive  nodes  of  our  graph  structure.   This  assignment  is 


76 

carried  out  through  the  classification  of  attribute  values  assigned  to  each 
primitive  node.   For  continuous  attribute  values  the  number  of  possible 
physical  nodes  (primitive  regions)  are  infinite,  but  by  dividing  the  con- 
tinuous values  of  the  attributes  into  discrete  intervals,  we  will  partition  the 
infinite  set  of  primitive  nodes  to  finite  set  of  primitive  classes. 

For  each  attribute  value  of  a  primitive  we  will  assign  it  to  the  class 
of  primitives  whose  value  interval  for  this  attribute  includes  the  specified 
value.   If  for  some  attribute  value  there  is  no  specified  class  we  mark 
this  node  as  noise. 

definition:   Two  primitive  nodes  are  semantically  equivalent  if  and  only  if 
for  all  the  specified  attribute  values  they  have  the  same  class  assignment. 

This  function  is  shown  in  action  box  A   of  Fig.  4.7. 

4.2.3   Selection  of  an  attention  point 

In  this  section  we  will  describe  the  procedure  which  discovers  the  set 
of  feasible  rules  (entry  points  to  the  parsing  graph)  to  be  tried,  starting  at 
any  primitive  region  which  we  call  an  attention  point  (AP) . 

It  is  clear  that  each  primitive  class  will  indicate  a  set  of  primitive 
nodes  in  our  graph  structure,  and  these  primitive  nodes  in  turn  will  imply 
a  set  of  graphical  rules  (entry  points  into  the  parsing  graph)  which  have 
one  of  these  primitives  as  their  nodes.   So  with  each  primitive  class  we 
will  have  a  set  (possibly  ordered)  of  entry  points  associated  with  it. 

For  each  node  of  the  scene,  selected  as  an  AP,  we  follow  the  following 
procedure : 

S:   set  of  entry  points 

a.  S  =  0  (null  set) 

b.  For  each  specified  attribute  value  class  C. ,  assume  S^  is 
the  set  of  associated  entry  points. 


77 


IF   S  = 


THEN  S  =  S 


c . 


ELSE   S  =  SHS. 

If   S  =  0    THEN  mark  this  node  as  noise  and  pick  another 
AP  and  go  to  a.   If  this  is  not  possible  the 
recognition  is  complete. 
Repeat  b.  for  next  attribute.   If  all  the  specified  attributes 
have  been  tried  then  go  to  d. 
d.   Test  the  acceptability  of  this  AP  (for  example  CARD(S)  must 
be  reasonably  small).   If  it  is  acceptable  then  leave  this 
routine  and  proceed.   Otherwise,  use  some  heuristics  to 
improve  upon  this. 
One  of  the  possible  heuristics  is  to  pick  up  a  structurally  tied 
teighboring  node,  and  develop  the  set  of  entry  points  for  this  AP .   Then 
by  intersecting  these  two  sets,  we  can  find  a  smaller  set  of  entry  points. 
Action  box  A   in  Fig.  4.7  refers  to  this  procedure. 
4.2.4   Search  of  domains  for  transformations 

The  implied  entry  point  in  the  parsing  graph  will  call  for  the  search 
of  the  domain  corresponding  to  its  associated  graph.   The  algorithm  of  this 
search  is  given  below,  but  to  understand  this  algorithm  let  us  make  a  few 
points  clear. 

Definition:   Two  super  nodes  are  semantically  equivalent  if  and  only  if  they 
have  the  same  graphical  rule  associated  with  them. 

The  search  of  domain  is  a  recursive  procedure  since  the  super  nodes 
require  that  their  corresponding  transformations  be  applied  to  the  scene. 
The  parsing  procedure  will  be  called  when  a  domain  corresponding  to  a  super 
node  is  found.   The  backup  procedure  will  undo  the  parsing  when  a  wrong 


78 

decision  has  been  made.   This  can  happen  in  one  of  the  three  following  ways: 

1.  A  wrong  branch  correspondence  has  been  made. 

2.  The  embedding  rules  have  eliminated  a  necessary  branch. 

3.  The  match  is  not  complete  and  we  have  to  select  another  equivalence  in 
the  graphical  rule  for  the  starting  point. 

Outline  of  the  Algorithm  Used  for  Domain  Discovery 

1.  Find  the  set  of  all  nodes  in  the  graphical  rule  which  are  semantically 
equivalent  to  the  starting  node  in  the  scene. 

2.  If  the  list  is  exhausted  return  the  best  match.   (There  should  be  enough 
information  in  the  best  match  to  enable  us  to  reparse  the  back-tracked 
rules.)   Otherwise  pick  up  the  next  element  (E^  of  the  list  of  equiva- 
lences as  a  match  for  the  starting  node  (E2). 

3.  Form  the  ordered  list  of  the  ADJACENT  branches  to  this  equivalent  node, 
(call  it  S  )  and  to  E  ,  (call  it  S^).   Initialize  pointer  ^  and  ?2   to  S± 
and  S2,  respectively. 

4.  Find  a  branch  in  S  ,  starting  at  ?2 ,  which  is  equivalent  to  the  branch 
pointed  to  by  P  .   If  the  branch,  which  is  not  already  matched  with 
another  branch  in  the  rule,  exists  then  go  to  5.   Otherwise:   no  match 
can  be  found  for  the  branch  pointed  to  by  P^. 

Put,  P  -  NEXTOF(P1)  and  initialize  ?2    to  the  set  S2- 

Go  to  4. 
5.   Compare  the  end  nodes  of  the  branches  pointed  to  by  PL  and  P2 .   If  they 
are  equivalent  reflect  this  fact  into  the  MERIT  of  the  current  match. 
[Note:   If  the  end  node  in  the  graphical  rule  is  a  super  node,  the  dis- 
covery of  its  equivalence  in  the  scene  will  involve  the  recursive  call 
to  this  routine;  and  besides,  we  have  to  check  that  the  branch  P2  leading 


79 

to  this  match  will  remain  valid  after  parsing.   If  ?2   was  eliminated  as 

the  result  of  parsing  we  have  to  back  up  from  this  newly  formed  super 
node, 

put  P2  =  NEXTOF(P2)  and  go  to  4.] 

PL  =  NEXTOF(P  ). 
Mark  these  branches  (P  and  P  )  and  their  end  nodes. 

If  P1  =  NULL  then  look  for  a  matched  node  in  the  rule  which  has  not  yet 
been  picked  as  the  starting  point.   If  such  a  node  exists  replace  E 
with  that  and  E  with  its  equivalence  in  the  scene;  go  to  3. 
Otherwise,  test  for  completion  of  the  search.   If  search  is  complete 
then  return  the  set  of  matched  node,  representing  the  domain. 
Otherwise,  set  E^    to  the  initial  starting  node,  back  up  from  recognized 
super  nodes  (except  the  one  that  may  correspond  to  starting  node),  save 
the  best  match,  and  go  to  2. 

Otherwise,  go  to  4. 

Otherwise,  put  ?2   =   NEXTOF  (P  )  and  go  to  4. 

This  procedure  corresponds  to  action  box  A   of  the  Fig.  4.7.   More 
detailed  explanation  of  these  algorithms  are  case  dependent.   We  will  clarify 
these  in  connection  with  our  example  universe  in  chapter  5.   We  have  also  to 
point  out  that  the  above  algorithm  does  not  exhaust  all  the  possibilities 
(like  branch  and  node  may   both  have  a  match,  while  it  is  still  a  wrong 
match),  but  we  know  that  they  seldom  happen; and  even  if  they  happened,  they 
will  change  the  context  which  eventually  will  force  us  to  back  up  and  correct 
them. 

4.2.5  Actual  parsing  of  a  found  domain 

As  we  mentioned  in  chapter  3,  in  our  class  of  transformations  the  range 
of  transformation  is  a  single  node,  and  a  single  parsing  routine  can  perform 


80 

all  the  transformations  as  follows: 

a.  create  a  super  node, 

b.  associate  the  graphical  rule  which  caused  this  transformation  with  this 
node , 

c.  take  care  of  embedding  branches. 

When  the  parsing  procedure  is  applied,  the  nodes  and  branches  of  the 
domain  will  become  temporarily  inactive.   As  far  as  further  processing  is 
concerned  this  is  equivalent  to  removing  these  elements  from  the  scene 
graph.   The  branches  cutting  the  domain  are  also  temporarily  made  inactive; 
but  if  our  embedding  rules  dictate  a  replacement,  new  branches  are  embedded 
between  the  nodes  outside  and  the  newly  created  node. 

4.2.6  Back-up  procedure 

Back-up  procedure  performs  the  inverse  action  of  the  parsing  procedure. 
The  super  node  and  all  its  adjacent  branches  become  inactive.   All  the  nodes 
and  branches  of  its  domain  and  also  the  branches  cutting  into  the  domain 
will  once  again  become  active  elements  of  the  scene  graph. 

When  this  procedure  is  called  within  A  ,  it  will  be  called  recursively 
until  no  super  node  is  left  in  the  domain  of  the  back  tracked  super  node. 
Since  some  of  the  elements  in  the  best  match  set  might  point  out  to  these 
back  tracked  super  nodes,  we  do  not  eliminate  their  memories  when  we  make 
them  inactive.   Then,  it  is  possible  to  transform  their  domains  back  to  this 
level  of  parsing  without  repetitive  and  extensive  search  for  the  domain  which 
is  already  found.   This  is  very  useful  in  connection  with  the  best  match 
feature  of  our  recognizer. 

4.2.7  Heuristics 

Heuristics  are  generally  techniques  which  avoid  exhaustive  search  and 
take  a  short  path  to  the  goal.   There  are  several  heuristics  which  are  very 


81 

useful  in  speeding  up  the  parsing  procedure. 

1.   Assignment  of  values  to  the  variables  associated  with  the  branches  in 
the  parsing  graph.   These  values  can  affect  the  order  in  which  the 
successors  are  tried.   The  recognizer  can  change  these  values  dynamic- 
ally  through  experience. 

2.  We  can  also  employ  a  scoring  system  which  can  affect  the  order- 
ing of  the  list  of  entry  points  associated  with  each  primitive  class. 

3.  At  any  point  of  search  we  can  pick  up  another  node  besides  the  one 
under  consideration  which  has  a  structural  tie  to  the  later  node.   Then 
instead  of  trying  the  set  of  successors  to  each  node,  we  may  try  only 
the  intersection  set  of  these  two  sets.   We  can  try  this  procedure 
repeatedly  until  we  are  satisfied. 

If  we  follow  the  following  procedure  in  forming  the  intersection 
of  two  ordered  set,  we  will  preserve  the  ordering. 

s  =  sLns2 

a.  Mark  all  the  elements  of  S  . 

b.  Scan  the  elements  of  S2  in  order,  and  include  it  in  S  (at  the  bottom) 
if  it  is  marked. 

4.  There  are  some  semantical  associations  between  the  objects  which  occur 
in  the  same  scene.   When  the  presence  of  an  object  is  highly  probable 
(like  chair  and  table),  we  can  form  a  set  of  all  its  descendent  rules 
and  itself.   Then,  we  will  intersect  this  set  with  the  set  of  entry 
points  inferred  from  the  current  node  (i.e.  AP),  and  reorder  the  elements 
in  the  later  set  such  that  the  elements  of  the  intersection  set  will  be 
tried  first. 

5.  In  the  recursive  call  of  the  search  for  domain  (looking  for  sub-domains), 
less  than  complete  match  is  normally  acceptable. 


82 


4.3   Best-Match  Feature  of  the  Recognizer 

In  the  real  world,  the  objects  do  not  always  appear  in  the  same  way 
they  are  specified  in  the  models.   They  can  be  occluded  or  partially  hidden 
from  the  view.   Our  basic  recognizer  can  be  easily  modified  to  enable  it  to 
handle  this  partial  matching. 

Definition:   Figure  of  Merit  (FOM)  is  a  variable  which  defines  the  degree 
of  partial  matching.   Its  value  ranges  from  0  to  1,  where  0  specifies  no 
matching,  while  1  is  for  complete  matching. 

Let  us  define  the  recursive  function  EXPAND  as  follows: 

EXPAND:   Primitive  node  -»  Primitive  node 

:   Super  node  ->  EXPAND  {set  of  nodes  in  its  domain}. 
And  it  has  the  following  property, 

EXPAND    {a,b}  =  EXPAND  (a),    EXPAND  (b). 

If  S   is  the  set  of  nodes  in  the  domain  and  S2  the  set  of  nodes  in  the 
graph  currently  being  matched,  then 

| EXPAND (SL)  | 
F0M  =  | EXPAND  (S2)  |  ' 

is  our  definition  for  figure  of  merit.   In  Fig.  4.8(a)  we  have  shown  an 
example  of  a  domain.   In  this  example, 
Sl   =  {A1,a1,a2,a9,b]L,b2}. 

EXPAND (S)    =  EXPAND{A1,a1,a2,a9,b1,b2} 

=   EXPAND(A1),a1,a2,a9,EXPAND{a3,a4},EXPAND{a5,a6,c1] 
=   EXPAND(A1),a1,a2,a9,a3,a4,a5,a6,EXPAND(a7,a8) 
=  EXPAND(A1),a1,a2,ag,a3,a4,a5,a6,a7,a8    . 

Since  the  cardinality  of  the  EXPAND  for  each  super  node  in  the  graph 
structure  is  a  constant,  we  can  have  this  associated  with  the  nodes  in  the 


83 


(a)   maximal  domain  matching  the  graphical 
rule. 


(b)   graphical  rule  being  matched 


Fig.  4.8.   An  example  of  a  graphical  rule  and  its  corresponding 
domain  in  the  scene. 


84 


parsing  graph.   This  constant  is  helpful  in  the  sense  that  we  can  find  the 
|EXPAND(S)|3  by  having  the  "FOM"  for  its  domain.   For  example  if  kl    (starting 
point)  was  a  super  node  and  its  associated  graph  g^   had  the  constant  "CONS^1 
then, 

[EXPAND^  )j  =  FOM.   x  CONS   . 

definitions:   MFOM  (Minimal  Figure  of  Merit)  is  the  minimum  value  for  FOM 
when  we  consider  that  any  partial  matching  has  occurred  at  all. 

AFOM  (Acceptable  Figure  of  Merit)  is  the  level  of  partial 
matching  which  we  accept  as  a  match  when  the  algorithm  of  domain  search  is 
called  recursively.   The  domain  is  actually  parsed  to  a  super  node,  only 
and  only  if  this  level  of  partial  matching  is  achieved. 

We  can  calculate  the  "FOM",  (Figure  of  Merit  for  domain  D)  as  follows: 

i 


FO 


,MD 


N  +  Y*  FOM.  x  CONS. 
D   £-»    i        3- 


CONS 


D 


Here, 

N  :  number  of  primitive  nodes  in  the  domain, 

i:  index  for  a  super  node  in  the  domain, 

FOM.  :  figure  of  merit  for  super  node  i, 

CONS  •   constant  associated  with  the  graphical  rule  matching  with  the  domain 
i 

of  this  super  node, 
CONS  :   constant  associated  with  the  graphical  rule  being  matched. 

In  our  recognition  algorithm,  at  any  stage  of  search  there  are  a  set  of 
possible  successors  that  can  be  tried.   Fig.  4.9  shows  the  algorithm  of 
search  for  the  best  match  at  any  one  stage.   In  this  flow  chart  the  variables 
have  the  following  interpretations: 

AP:   attention  point,  which  is  the  result  of  the  last  stage  of  search  (can  be 
either  primitive  or  super  node), 


No 


Search  for 
domain  S„ 


search  is 

complete  lor 

this  stage 


HMFOM  ■  FOM, 


US 


s„ 


restore  I  he  dnmai  I) 

to  the  condition  before 

the  search  lor  Sr 


was  Initiated 


D 


85 


Initial! ze   P  to 

the  ordered 

set  S 


the  best  match 

is  recognized  as 

the  match  for 

this  stage 


Q 


proceed 
next  stage 


i  u  the 
ol  scare 


terminate 
search  for 
this  object 


D 


Yes/back  up  to  the 
former  stage 
of  search 


NEXTOF(P) 


Fig.  4.9.   Flow  chart  of  the  best  match  algorithm. 


86 

BMFOM:   figure  of  merit  for  the  best  match, 

S:   ordered  set  of  successors  to  AP, 

S  •   domain  of  the  current  alternative  (set  of  nodes), 
D 

FOM^:   figure  of  merit  for  the  current  alternative, 
BS :   set  of  nodes  in  the  best  match  domain. 

When  our  task  is  to  pick  the  best  of  some  alternative,  it  is  obvious 
that  the  conditions  must  be  exactly  the  same  before  we  try  each  alternative. 
This  is  accomplished  by  recursive  calls  to  the  back-up  procedure  for  each 
super  node  (excluding  AP)  in  the  current  domain  before  the  next  alternative 
is  tried.   But  to  avoid  repetitive  search  for  the  best  match,  the  structure 
of  higherarchical   clustering  in  its  domain  is  saved  to  be  reparsed  later  on 
without  actual  searching. 

In  the  domain  search,  when  we  have  more  than  one  node  in  the  graphical 
rule  which  matches  with  AP,  we  are  faced  with  the  same  task  of  selecting 
the  best  match.   This  is  accomplished  in  the  same  manner  as  in  the  recognizer 

4.4   Other  Useful  Transformations 

There  are  several  useful  operations  that  can  be  applied  to  the  scenes 
as  well  as  to  our  universe  which  expand  the  ability  of  the  recognizer  to 
recognize  the  objects  in  a  more  realistic  world.   These  operations  can  be 
thought  of  as  transformations  which  conform  the  scenes  to  our  universe  or 
modify  the  universe  to  have  the  knowledge  transformed  to  a  form  which  helps 
the  scene  interpretation. 

Among  this  class  of  transformations  are  the  ones  which  can  be  applied 
to  images  of  cerebral  context  to  discover  deformity.   In  matching  the  cross 
sections  of  a  brain  to  the  brain  atlas,  some  of  the  region  parameters  or 
relation  structures  could  be  deviant  from  the  atlas,  and  by  defining  a 


87 

transformation  which  makes  the  matching  possible,  we  will  have  discovered 
the  deformity  which  in  turn  will  tell  us  about  its  effects  on  different 
sections.   Guzman  was  the  pioneer  [l9]  in  discovering  a  simple  set  of  rules 
which  can  be  used  to  transform  the  scenes  into  their  constituent  3-dimen- 
sional  bodies.   He  mainly  uses  the  vertices  type  properties  to  merge  their 
associated  faces.   This  can  be  thought  of  as  a  preprocessor  to  our  3-dimen- 
sional,  model-based,  parsing  scheme.   We  have  used  some  of  the  simple  rules 
(T-joint,  Y- joint,  etc.)  to  demonstrate  its  compatibility  with  our  modeling 
system,  and  the  results  will  be  discussed  in  chapter  5. 
4.4.1   Rotational  transformations 

A  graphical  model  of  an  object  contains  all  the  pictorial  knowledge 
about  that  object.   As  a  matter  of  fact,  the  relational  description  of  the 
knowledge  about  the  universe  should  be  flexible  enough  to  allow  us  to  reflect 
modifications  caused  by  an  external  or  internal  factor.   For  example,  in 
sociology  hate-like  relationships  could  be  described  in  a  relational  graph. 
Now  if  the  effect  of  external  factors  like  heat  or  cold  or  internal  factors 
like  death  and  marriage  are  known,  then  we  should  be  able  to  transform  the 
body  of  knowledge  to  arrive  at  the  new  relationships.   Here  in  connection 
with  picture  processing  we  define  the  rotational  transformations  which  from 
the  unique  models  of  the  objects  will  construct  the  models  for  the  same 
objects  in  different  orientations.   In  chapter  3  we  defined  a  set  of  opera- 
tions (i.e.  T   T  ,  T_)  and  found  out  that  the  class  of  relations  we  chose 
were  closed  under  these  operations.   Using  these  operations  now  we  can  define 
the  following  rotational  transformations:   T    ,  T    ,  T   7  . 

1.   For  TR90,  TRIg0,  and  TR27Q  apply  the  operations  T ,  T  and  T  respectively 
to  the  set  of  relations  in  the  graphical  model. 


88 


2.  Semantics  of  the  nodes  representing  object's  primitive  regions  are 
modified  to  represent  their  projections  at  this  angle. 

3.  Semantics  of  the  super  nodes  are  unchanged. 

4.  Eliminate  the  parts  which  are  now  hidden  from  our  view  because  of  the 
new  orientation.   We  will  have  the  relative  size  of  each  part  associated 
with  the  node  representing  it. 

Now  if  one  of  the  branches  in  the  graph  tells  us  that  a  smaller  part  is 
located  directly  behind  a  bigger  part,  its  corresponding  node  should  be 
mapped  to  a  null  node  by  this  transformation  (in  other  words,  deleted  from 
the  graphical  representation).   Or  if  it  will  be  partially  hidden,  this  fact 

should  be  reflected  in  the  semantics  of  the  node  representing  it.   For 

o 
example,  the  parts  of  a  human  face  will  disappear  from  the  view  by  a  180 

rotation.   In  chapter  5  we  will  show  the  results  of  our  experimentation  with 
a  few  3-dimensional  models  using  these  transformations.   Since  these  trans- 
formations are  irreversible  (deletion),  we  have  to  use  a  separate  copy  of  the 
graphical  rule  for  each  transformation. 


89 
5.   APPLICATION 


We  have  developed  a  general  methodology  for  global  picture  analysis 
by  graph  transformations.   In  this  chapter  we  show  the  results  of 
applying  this  methodology  to  a  simple  class  of  cartoons.   We  have  avoided  men- 
tioning the  detailed  algorithms  since  they  are  the  same  procedures  outlined 
in  chapter  4.   In  this  application  we  have  also  tried  to  conform  as  much 
as  possible  to  the  general  procedural  steps  discussed  in  the  last  chapter. 

Although  we  have  made  this  application  quite  simple  for  reasons  of  clarity,  it 

involves  all  the  basic  factors  and  complexity  of  the  general  case. 

We  assume  the  objects  in  the  scene  are  well  formed,  and  for  the  time  being 

non-occluded. 

5«1   Primitive  Classes 

We  have  learned  that  the  nodes  in  the  graphical  representation  of  input 
Picture  represent  the  picture's  regions,  primitive  nodes  in  the  model  graphs 
represent  the  primitive  regions  (or  a  set  of  alternative  primitive  regions) 
in  the  model,  and  super  nodes  represent  higher  level  objects  (a  graph  in 
our  graph  structure).   We  also  know  that  the  semantic    equivalence  of  two 
nodes  is  defined  as  follows: 

Two  primitive  nodes  are  semantically  equivalent,  if  for  all  of  their 
specified  attributes  they  belong  to  the  same  primitive  classes.   Two 
super  nodes  are  semantically  equivalent  if  they  have  the  same  graph 
associated  with  both. 

Now  we  have  to  define  our  primitive  nodes'  attributes  which  divide  our 
primitive  nodes  into  primitive  classes,  such  that  a  set  of  these  classes,  one 
for  each  attribute,  define  a  primitive  region. 


90 


5.1.1  Shape  attribute 

In  Table  5.1  we  have  shown  the  class  of  prototype  shapes  which  divide 
our  primitive  regions  into  29  equivalent  classes.   Methods  discussed  in  [3] 
by  Maruyama  can  be  used  to  assign  each  primitive  region  to  one  of  these 
classes  (the  closest  in  shape).   When  no  shape  assignment  is  possible  we 
assign  this  primitive  to  the  class  of  primitives  which  have  an  undefined 
shape  (#30). 

5.1.2  Compactness  attribute 

This  attribute  can  actually  be  a  feature  of  the  shape  attribute,  but 
has  been  brought  up  to  emphasize  its  relative  importance  and  also  to  have 
more  than  one  attribute  for  the  primitive  regions.   This  attribute  is  use- 
ful in  that  it  divides  the  primitives  into  classes  of  narrow  and  broad 
regions.   It  is  defined  as  follows: 

L  =  16(l-2Vm/P)+  1, 
where  A  is  the  area  and  P  is  the  perimeter  of  the  region.   The  constant  16, 
besides  the  fact  that  we  found  it  appropriate  for  this  particular  applica- 
tion, has  no  general  significance.   This  attribute  divides  our  primitives 
to  7  different  distinct  classes  as  follow: 

class  #1:   1  <  L  <  3 

class  #2:   3<L<6 

class  #3:   6  <  L  <  8 

class  #4:   8  <  L  <  10 

class  #5:  10  <  L  <  12 

class  #6:  12  <  L  <  15 

class  #7:  16  <  L  . 
The  value  of  attribute  L  is  1  for  a  circle  and  17  for  any  region  with  0  area 


Table  5.1.   Table  of  the  values  for  shape  attribute 


92 


5.2   Relations 

We  have  chosen  to  use  the  same  class  of  relations  as  those   given  in 
chapter  3.   There  we   found    that  this  class  is  closed  under  inversion. 

(NEXT)"1  *  NEXT 

(CONTAIN)'1  =  INSIDE 

(DROF)"1  =  DLOF 

(DBELOW)"1  =  DABOVE 

(DINB)"1  =  DINF 

(ROF)"1  =  LOF 

(BELOW)"   =  ABOVE 

(INB)"1  =  INF 

Although  we  allow  the  user  (or  preprocessor)  to  use  all  of  these 
relations  in  generating  the  graphical  representation  of  the  scene,  we  can 
use  a  simple  preprocessing  procedure  which  changes  all  the  branches  labeled 
by  left  hand  side  of  the  above  equation  to  labels  on  the  right  hand  side 
and  change  the  orientation  of  the  branch.   This  will  allow  us  to  use  only 
the  set  of  right  hand  side  relations.   Of  these  relations  NEXT,  INSIDE,  DLOF, 
DABOVE,  and  DINF  are  called  structural  relations,  which  are  used  in  defining 
the  graphs  of  objects  in  the  universe.   Since  in  our  graphical  analysis  we 
are  mainly  concerned  with  these  structural  relations,  we  have  chosen  to  use 
another  preprocessor  to  generate  the  implied  branches  by  transitivity.   Among 
the  structural  relations  only  "INSIDE"  is  transitive,  so  in  our  input  graph 
if  we  have  branches  (A, INSIDE, B)  and  (B, INSIDE, C)  we  generate  the  branch 
(A, INSIDE, C)  and  repeat  this  process  recursively  until  all  the  implied  branches 


93 


are  generated.   As  for  relations,  LOF,  ABOVE,  and  INF,  although  they  are 
transitive  we  do  not  generate  the  implied  branches  unless  they  are 
explicitly  asked  for. 


5-3  Models  and  Graph  Structure 

In  defining  our  £ra£h  structure,  which  contains  all  the  knowledge  neces- 
sary for  recognition,  we  start  out  with  the  definition  of  the  PGS  (primitive 
graph  set).   Each  primitive  region  is  represented  as  an  n-tuple,  where  each 
index  represents  the  primitive  class  of  that  attribute.   For  example  (3,4) 
represents  a  primitive  region  which  belongs  to  the  third  primitive  class  in 
respect  to  its  shape  attribute  and  fourth  primitive  class  in  respect  to  its 
compactness  attribute,  and  this  region  is  equivalent  to  all  the  regions  in 
the  real  word,  whose  shape  attribute  is  3  and  their  compactness  attribute 
has  a  value  8  <  L  <  10.   Since  each  primitive  region  has  a  set  of  attributes, 
and  for  each  attribute  value  it  belongs  to  a  primitive  class,  we  can  have 
as  many  as  30  x  7  =  21  different  primitive  regions  in  our  universe. 

Each  primitive  node  in  a  model  graph  represents  a  set  of  primitive 
regions  which  are  acceptable  as  the  primitive  region  represented  by  that 
node.   Some  primitive  graphs  (graphs  which  contain  only  primitive  nodes) 
represent  meaningful  objects,  while  others  are  merely  parts  of  higher  level 
objects.   Primitive  graphs  which  do  not  represent  any  part  of  any  higher 
Level  object  belong  to  the  FGS  (final  graph  set).   Using  the  non-final  primi- 
:ive  graphs  and  primitive  regions  we  construct  the  graphs  which  represent 
additional  objects  in  our  universe.   This  process  is  repeated  until  the  graph- 
ical model  of  all  the  objects  in  the  universe  are  constructed.   In  Table  5.2 
we  have  shown  the  objects  and  their  corresponding  graphical  models.   In  this 
table  each  super  node  is  represented  by  its  associated  graph,  and  each 


94 


03 
4-1 
O 

0) 

•"- ) 

x 
o 

0) 

-G 

4J 

o 


O 


ct) 
O 

x; 

O 


QJ 
Xi 
H 


x; 
a. 

S-l 

a 


-a  Q 
o   H 


oo 


,nDABOVE.rsl 


f> 


<N    Cvl 


lDAB0V6vX) 


^ 


Q 


pd 


o 


X! 


X) 

<f  NEXT 


o> 


CNJ 


H 
W 


NEXT,, 


m 


X! 

w 


z 
o 
o 

p-l 

oo 


2 


PL, 

a 


LO 


3 


o 
,—1 


95 


""I    D1NJFV  v£> 
H TrH 


00 

T— < 


r-._DINF  .  O 

i—i >_j 


O 
Q 


CO 
cq 


cr. 
i—i 

2 


9 


<t 
^DABoyev^  ^ 

7    •>       » 

CM    CN 

r- 1    00 


CM 
CN 


P 

o 


cn  <r 


W 

w 


I 

o 
o 

Q 


CN 

■-( 


en 

r-t 

a 


a 


a 


96 


X) 
CD 

a 
C 

•rl 
4-1 

c 

o 
u 


en 

4-1 

O 

CD 
•i-) 

rQ 
O 

0) 
JC 
4-1 

4-1 
O 


CU 
X) 
O 

6 


u 

•H 

x: 

Cu 

S-( 
O 


CM 

ID 

0) 


3 

p^ 


oo 

CM 


CM 


CM 


CM 


CD 

« 

Q 

Q 

Pi 

Prf 

H 

H 

« 

eq 

CM 


00 
CM 


j3 
cd 

H 


r-lQABOVE    O 

CM — >CM 


H 
X 


CM 
CM 


CM 


97 


CM 
CO 

S 


Pn  ♦ 

g' 

>          o 

o 

Q 

pq 

<d 

t~—i 

n 

CM 

En  A 

O 

Q 


CM  <■ 

a 


JNiag 


o 

Q 


<f 


I 

a 
2 

H 

PQ 


<f 


o 

s 


Q 
O 

pq 

i 


pq 


^J 


irwi 


CNJ 


q 

erf 

M 

pq 


.-J 
O 

En 


C-4 

a 


98 

primitive  node  with  a  set  of  alternative  primitive  regions  which  it  represents 
For  each  alternative  the  first  number  inside  the  parentheses  represents  the 
shape  primitive  class,  and  the  set  of  numbers  inside  the  second  parentheses 
represent  the  acceptable  compactness  primitive  classes.   In  this  represen- 
tation a  primitive  node  can  be  associated  with  several  different  regional 
views  of  the  same  part  of  the  object.- 

Now  that  we  have  defined  the  graphs  in  our  universe,  we  can  define  the 
entry  points  associated  with  the  primitive  classes  as  follows: 
Entry  points  associated  with  compactness  primitive  classes: 
1:   M1,M2,M3,M4,M5,M6,M8,M9,M10,M12,M13,M14,M15,M16,M17,M20,M21,M23,M24,M25, 

M26,M27,M28,M30,M32,M33,M44 
2:   M1,M2,M3,M5,M7,M8,M11,M12,M13,M14,M16,M17,M20,M21,M23,M245M25,M27,M28, 

M30,M33 
3:   M3,M4,M6,M9,M10,M11,M12,M13,M14,M16,M21,M24,M28,M32,M33 

4:  M3,M6,M9,M10,M11,M12,M13,M14,M16,M21,M24,M32,M33 

5 :  M6 ,M9 ,M10,M11 ,M12 ,M13 ,M24,M32 ,M33 

6:  M6,M9,M10,M11,M12,M13 

7:  M10,M12,M13  . 

Entry  points  associated  with  shape  primitive  classes: 

1:  M6,M7,M8,M11,M12,M20,M21,M24,M32 

2:  M6,M8,M20,M21 

3:  M3,M24 

4:  M4,M5,M6,M14,M24,M26,M27,M28,M33 

5:  M5 

6:  M5,M10 

7:  M34 

8:  M2,M6,M11,M12,M21,M32,M33 


99 

9:  Mil 

10:  M23 

11:  M3,M9,M23 

12:  M3 

13:   M1,M3,M4,M9,M17,M17,M33 

14:   M30 

15:   M30 

16:   M26 

17:   M27 

18:  M1,M12,M15,M16 

19:   M2,M12 

20:   M16 

21:   M2,M14 

22:   M13 

23:   M30,M32 

24:   M33 

25:   M15 

26:   M16 

27:   M33 

28:   M3,M25,M28,M33 

29:   M34  . 

As  we  observe,  the  shape  attribute  provides  us  with  rich  information  to  pin- 
point the  set  of  entry  points  in  our  parsing  graph.   With  some  minor  heuris- 
tics we  are  able  to  improve  on  this  even  further.   For  example  if  two 
structurally  tied  primitive  nodes  have  shape  attributes  1  and  18  respectively, 
their  implied  entry  points  will  be 
1:   M6,M7,M8,M11,M12,M20,M21,M24,M32 


100 


18:   M1,M12,M15,M16  ,  and 

by  intersecting  these  to  sets; 

1  and  18:   M12  . 

In  this  case  we  find  that  rule  M12  should  be  tried  first. 

Fig.  5.1  shows  the  parsing  graph  of  our  graph  structure;  the  best  way 
to  find  out  how  the  system  works  is  to  go  through  an  example  carefully.   We 
do  this  in  the  next  section. 


101 


a. 

CO 

i-l 

XI 


u 

o 

4-1 

sz 
a. 

u 

00 
00 

c 
•1-1 

CO 

nJ 

PL. 


oo 

•H 


102 


5.4   Careful  Analysis  of  an  Example 

First  to  give  a  general  view  of  the  system's  performance,  we  have  shown 
the  computer  results  of  analyzing  the  scene  in  Fig.  5.2.   The  graphical 
representation  of  this  scene  is  shown  in  Fig.  5.3.   The  computer  processing 
time  for  this  example  was  21.40  seconds.   This  includes  the  preprocessing 
time  for  the  "house",  and  rotation  of  the  bird  model ,  which  will  be  discussed 
in  later  sections.   The  only  heuristic  used  is  the  object  associations, 
meaning  that  every  time  we  discover  an  object  in  the  scene  we  give  higher 
priority  to  the  objects  which  can  jointly  occur  with  this  object  in  a 
natural  scene.   No  backtracking  was  necessary  in  this  analysis. 

Since  this  scene  is  rather  complex  for  detailed  analysis,  we  have  chosen 
one  portion  of  it  as  shown  in  Fig.  5.4.   The  analysis  time  for  this  portion 
was  7.28  seconds,  while  the  same  portion  required  8.45  seconds  in  a  more 
complex  environment  of  Fig.  5.2.   This  is  mainly  due  to  the  linear  list 
processing  technique  which  requires  more  processing  time  in  a  larger  scene. 
Fig.  5.5(a)  shows  the  scene  graph  after  relation  preprocessing,  which 
replaces  the  inverse  relations  and  creates  branches  implied  by  transitivity 
for  relation  "INSIDE".   First  AP  (attention  point)  pointed  out  by  the  user  was 
region  29  of  Fig.  5.4  (or  node  "ND29"  of  Fig.  5.5(a)).   The  shape  attribute 
of  this  region  has  the  value  23,  which  implies  rules  M30  and  M32.   The  value 
of  compactness  attribute  can  not  reduce  this  set  any  further.   Since  the 
front  view  of  bird's  body  (M30)  did  not  match  with  a  subgraph  in  the  scene 
(starting  at  "ND29"),  M30  was  rotated  90°  to  the  left,  and  matching  was 
tried  again.   This  time  domain  Al  of  the  scene  graph  matches  with  this 
rotated  M30.   Domain  Al  was  parsed  to  a  super  node  "M30",  shown  in  Fig. 
5.5(b).   Now  rule  M30  in  the  parsing  graph  implies  rule  M31.   The  matched 
domain  with  this  rule  is  A3,  and  in  the  course  of  search  for  this  domain  we 


103 


Hi 


XL 


51 


0 

5 

3 

# 


42    44 


Fig.  5.2.   A  scene  example.   Numbers  are  used  as  node 
identifiers  in  the  scene  graph. 


104 


(ndi  j 


Fig. 


5.3.   Graphical  representation  of  Fig.  5.2. 


105 


ja&iif  af&:ai£ui3a  at  uj£  SQtug. 

1.  BIRD 

2.  HOUSE 

3.  MAN 
<♦.  CLOUD 

5.  HAT 

6.  HOUSE 

7.  CLOUD 

8.  CLOUD 

9.  TREE 
10.  TREE 


£NQ  ££  afilcf  D£SLfiI£II0!j. 


106 
BILL  D£.£££.I£II££I   Q£  Ih!£  ££££!£• 

1UL  EDLLQaiNfi  SiJ££££££iiL  £I£££  Jd£R£  IA*£fl  Hi!  £Aa£lUG  IH£  &&£U£- 

OBJECT  *TREE**. 

REGION  *N049**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *TREE**. 

REGION  *ND*7**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *CLOUD**. 

REGION  *ND6t>**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *CLOl.JC**. 

REGION  *ND6^**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *HO'JSE**.. 

OBJECT  *BUILCING**. 

REGION  *ND53**  IS  THE  ATTENT ION  PCINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *HAT**. 

REGION  -NOLO**  IS  TH=  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *CLOUO**. 

REGION  *.ND40**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *MAN**. 

OBJECT  *HAND**  . 

REGION  *ND19**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOV^  STE°S. 

OBJECT  *HOUSE**., 

OBJECT  *ROOF**. 

REGION  *NOl**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STE°S. 

REGION  *ND2L**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *BIRD»*. 

OBJECT  *BIRD_B**. 

REGION  *ND29**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

Idl£  £a^£LUr:££  ia£  E£££RI  Q£   TH£   SU£££ii££llL  AII£!!£IS. 


THE 


£AES£D  E1LIQEIAL  1UEQEMAIIQU  IS  AS  £OLLOi&:  107 


HE  PRIMITIVE  REGION  *ND21**  IS  LEFT  WITHOUT  ANY  INTERPRETATION. 
END  0£  EELAIIOttAL  QES££l£IIOU  £OE  IBIS  EfifilflN. 

THE  PRIMITIVE  REGION  *ND3l**  IS  LEFT  WITHOUT  ANY  INTERPRETATION. 

THE  PRIMITIVE  REGION  *ND31**  IS  MOST  LIKELY  A  HIDDEN  PART  OF  THE 
♦BIRD**. 

£M2  0£  EELAI1QUAL  D£i££l£IIfl£J  EOE  ItJlS  EEG10NJ. 
A   *BIRO**  IS  FOUND  IN  THE  SCENE. 

I  IIS  EELAIIQUS  10  Iti£  flltiEB  £ARIS  0£  Ifl£  S££Ni£  AE£  AS  EQLLQiiS: 
THIS  *BIRD**  IS  LOCATED  AT  THE  LEFT  OF  THE  PRIMITIVE  REGION  *ND21**. 

THIS  *BIPD**  HAS  THE  STRUCTURAL  TIE  *OABOVE*«  TO  THE  PRIMITIVE 

REGION  *ND31**.  THIS  REGION  COULD  BE  A  HIDDEN  PART  OF  THE  OBJECT. 

£UQ  Q£  EELAI1CN.AL  QESLEIEIION  EOE  IJdIS  flfljfid. 

A   *H0(JSE**  IS  FOUND  IN  THE  SCENE. 

i 

JUS  EELAIIOUS  10  Ifcj£  QIH££  EAEIS  0£  IU£  SLEUE   AR£  AS  EOLLOMS: 

THIS  *HOUSE**  IS  LOCATED  AT  THE  LEFT  OF  THE   *MAN**. 

THIS  *H0USE**  IS  LOCATED  BELOW  THE   *CLOUD**. 

EUQ   Q£  EELAI1CNAL  OESOE1EIIQU  £QR  IblS  flflj££l. 
A   *MAN**  IS  FOUND  IN  THE  SCENE. 

US  KELAIIOLIS  10  IU£  OIti£fi  EAEIS  0£  Itil  ZLEtAE   AE£  AS  EOLLOKS: 
THE  RFLATION  *HOLO**  BETWEEN  THE  *MAN**  AND  THE  PRIMITIVE 
REGION  *ND21**  IS  NOT  KNOWN  TO  THE  RECOGNIZER. 


THIS    *MAN**     IS    LOCATED    AT    THE    PIGHT    OF    THE       *HOUSE**. 

108 
THIS    *MAN*»     IS    LOCATED    BELOW    THE       *HAT**. 

THIS    *MAN**     IS    LOCATED    AT    THE    LEFT    OF    THE    SECOND      *HQUSE**. 
£ND_  U£   iikLAIIOAL  D£S£RIEII£fcl  tflfi  ItilS  DJEU££I. 

A       ♦CLOUD*'*     IS    FOUND    IN    THE    SCENE. 

IIS  RELAIIQNS  12  ZJdL  ClidLE  EAEIS  Q£  I££  S££bl£  A££  AS  £i]£LQ&S: 

THIS    *CLCUO**     IS    LOCATED    ABCVE    THE       *HOUSE**. 

THIS    *CLOUD**    IS    LOCATED    AT    THE    LEFT    OF    THE    SECOND      *CLOUD»*. 

£N_0  Qt  *£LAIIOMAL  2£S£JUEII0N  £QB.  IblS  ojaj££i- 

A   *HAT*«  is  FOUND  IN  THE  SCENE. 

IIS  E£LAIIQ^S  10  Iti£  DIt!i£  EAEIS  U£  Iij£  StLiiML  A&£  AS  £OLLQkiS* 
THIS    *HAT»*    IS    LOCATED    ABOVE    THE       *.MAN**. 

£JNJ2  D£  d.£LAII£NAL  D£S££I£IIQN   £DE  ItilS  QflJ££I- 

A    SECOND       ♦HOUSE**    IS    FOUND    IN    THE    SCENE. 

IIS  ££LAIIQNS  IQ   lidt  QIJtf££  EASIS  Q£  IJd£  S££N£  A8.£  AS  £QLLUkS: 

THIS    ♦HOUSE**     IS. LOCATED    AT    THE    RIGHT    OF    THE       *MAN**. 

THIS    *HO'JSc**    IS    LOCATED    BELOW    the    SECOND       *CLOUO**. 

THIS    *HOUSE**    IS     LOCATED    BELOW    THE    THIRD         ♦CLOUD**. 

THIS    *HHJSt-**     IS    LOCATED    IN    BEHIND    OF    THE       *TRFE**. 

THIS    ♦HOUSE**     IS    LOCATED     IN    BEHIND    OF    THE     SECOND       *TREE**. 

£NQ   C£  LLLhILLML  D£i£fiI.EIlflfl  £LiE   ItilS  fl&tf£'I. 
A    SECOND       *CLOUD**     IS    FOUND    IN    THE    SCENE. 


109 
IIS  ££LAIiaNS  IQ   Iti£  2IH.££  £A£IS  Q£  Uj£  S££U£  A££  AS  tiJLLQaS: 
THIS  *CLQUD**  IS  LOCATED  AT  THE  RIGHT  OF  THE   *CLOUD**. 

THIS  *CLGUO**  IS  LOCATED  A60VE  THE  SECOND   *HOUSE**. 

THIS  *CLQUD**  IS  LOCATED  AT  THE  LEFT  OF  THE  THIRD    *CLOUD**. 

END  D£  >1£LAIIDNAL  D£SLE1£II0N.  £Q£  UiLS  fjaj££I. 

A  THIRD    *CLCUD**  IS  FOUND  IN  THE  SCENE. 

IIS  ££LAII{jNS  IQ   Ud£  TJItiEfi  £ASIS  J£  IJj£  S££N,£  AE£  AS  £QLLtmS: 
THIS  *CLOUO**  IS  LOCATED  ABOVE  THE  SECOND   *HOUSE**. 

THIS  *CLOUD**  IS  LOCATED  AT  THE  RIGHT  OF  THE  SECOND   *CLOUD**. 
£ND  Q£  d£i.AIIQNAL  Q£SL£l£llQN,  EJa.  IiJlS  DaJEGI. 

A   ♦TREE**  IS  FCUND  IN  T^E  SCENE. 

IIS  E£LAIia^S  Ifl  Iti£  aibEE  £AKIS  2£  !££  S£EU£  AE£  AS  £QLLUKS: 
THIS  *TREE«*  IS  LOCATED  IN  FRONT  OF  THE  SECOND   *HGUSE**. 
THIS  *T»Eb**  IS  LOCATtD  AT  THE  LEFT  OF  THE  SECOND   +TREL**. 

£ND  Q£  ££1AI1£HAL  Q£SL£I£IIGU  £D£  I!±LS  OfiJ£Cl. 

A  SECOND   *TREE**  IS  FCUND  IN  THE  SCENE. 

IIS  EELAIISUS  IQ  Iti£  LlIH.EE  £A£IS  Q£  Itl£  S££N£  AE£  AS  EQLLQ&S: 
THIS  *TREE**  IS  LOCATED  IN  FRONT  OF  THE  SECOND   *HGUSE**. 

THIS  *TRE!r**  IS  LOCATED  AT  Thp  RIGHT  OF  THE   *TREE**. 

E*!Q  C£  E£LAIIL1N.£L  DESCEIEIILifci  £Q£  ItllS  QliJeLI. 
tUD.   Q£  S££^£  LIESLElEIIuN,. 


L10 


Fig.  5.4.   A  subpicture  of  the  picture  in  Fig.  5.2, 


Ill 


(a)   graphical  representation  after  relation  preprocessing 


Q^^(^) 


(b)   scene  graph  after  bird's  body  was  parsed  (M30) 
Fig.  5.5.   Part  A.   Parsing  process  of  the  scene  in  Fig.  5.4, 


112 


(c)   scene  graph  after  bird  was  parsed  (M31) 


BIRD 


HAT 
(M23V 


OABOVE 


(d)      final   processed    scene    graph 
ig.    5.5.      Part    B.      Continued   parsing   of   the    scene    in   Fig.    5.4. 


Fig 


113 

had  to  look  for  the  subdomain  A2  which  corresponds  to  bird's  head  (M29) . 
The  domain  of  these  parsed  super  nodes  is  eliminated  from  Fig.  5.5(c)  for 
clarity.   Since  "M31"  corresponds  to  a  final  graph  (€  FGS),  search  for  this 
object  is  complete.   The  next  attention  point  picked  up  by  the  recognizer 
was  region  9  (node  "ND9"),  which  is  an  excellent  choice  since  its  shape 
attribute  has  value  10,  which  implies  one  rule  M23  (the  hat  model).   Domain 
A4  in  Fig.  5.5(c)  corresponds  to  M23  which  is  parsed  as  such.   Next  we 
observe  the  relation  "DABOVE"  between  this  hat  and  region  13  (node  "ND131'), 
which  the  value  of  its  shape  attribute  (4)  implies  the  following  set  of  rules 
M4,  M5,  M6,  M14,  M24,  M26,  M27,  M28,  M33 . 

But  since  the  system  is  knowledgeable  of  object  associations,  it  will 
find  out  that  the  man  is  associated  with  hat  (semantically) .   So  the  follow- 
ing rules  M33,  M34,  Mil,  M32  are  given  higher  priority  in  the  list  of 
possible  entry  points  from  AP  (in  this  case  "ND131')  .   From  these  rules  only 
M33  occurs  in  the  above  list,  which  has  to  be  tried  first.   Indeed,  this  has 
been  the  case  and  domain  A5  is  found, which  corresponds  to  M33  (man's  head). 
Man's  head  in  the  parsing  graph  implies  a  man  (M34).   Domain  A6  corresponds 
to  this  rule,  besides  A5  which  has  already  been  parsed.   The  algorithm  had 
to  look  for  4  subdomains  which  correspond  to  two  hands  (M32)  and  two  feet  . 
(Mil).   Domain  A6  is  parsed  to  a  super  node  "M34".   The  recognizer  also  tried 
to  parse  nodes  ND31  and  ND21,  but  was  unable  to  do  so.   ND31  had  a  structural 
tie  to  the  BIRD,  so  the  recognizer  assumed  that  this  must  be  a  partially 
hidden  part  of  the  rotated  bird.   The  final  processed  scene  graph  is  shown 
in  Fig.  5.5(d)  which  has  5  active  nodes.   The  message  routine  will  report 
this  final  graph.   This  report  is  reproduced  in  the  following  pages. 


114 


pp  tfp   nF9t"RTp^IQk  r_£  ZUL  SLIME • 

thf   fpi  LQamS   rR-iFrT";   vfpf   FAR  SEP  IB  2H£   £££&£ 

1  .      ^T'JO 
2.     HAT 
3  .     MA N 


FMD    0£    RCTF.r     nrcrPTPTTr-M 


115 


FULL    nrc.rRTPTTPM  q£  jii£  sr.FNP. 

1HE  FOLICWTNG  SUCCFSfiFHI     STFPS   WFRF  takfn  m   parstnp,   thf    scfmf. 

OBJECT  *MAN)**. 
OBJECT  *FACF**. 

region  *nqi3**  is  the  attention  "point  pf  parsing  the  above  steps. 
object  *hat**. 

Region  *nd9**  is  thf  attention  point  cf  parsing  tme  above  steps, 

IFCION    *IMD21**    IS    ^ME    ATTEMTTHN    POTNT    OF    PARSING    tHE    argve    STEPS. 

DPJFCT    *BIRD**. 

OBJECT    *RTPO_A**. 

lEfiir-N    *N029**    IS    THE    ATTENTION    PnjNT    OP     PAPSING    tHe    A10VE    STEPS. 


LUIS.  CQNCLUQFS  IHL  pftprt   q£  j_hf.  sut.fssfhi    attfvpts. 


tHF    p  g  p  S  F.  Q    "TTTnp  T;.l     tmf^qmat  IGN   J_S-  iLS    Fp|  '  rwc-- 


116 


TMF    PPIMITTVF    PECIHN    *ND21**    IS    LEFT    WITHOUT     AMY    INFERPRET&  "TON 
EiiT2  ILL    L£J  ftJJiM    nrcrpTP-Tfr    £C_o   jj^yc  _oFr.  tpm  . 


THF    ddtmt'tvF    FF^IGM    *^031*'*    IS    LEE-    WTTHCIF     ANY    INTERPRETATION. 
thf    "PIMTTIVE-  PECICN    **D31**     IS    MOST     LIKELY    4    HIDDEN    PART    nF    THE 


£LU  ILL  ?Fl.l-I^/L    npcrcTpTTr>)    frji   J_H1S    r^I_PN. 


j       *Bi:f]4*     IS    FpHN'D    IN    TNF     SCENE. 

HI    ?d  airnvs    T£_  ihf    r^HEP     PA?TS    P_F    J_H£    TENE    H£  J£  £SLL2ldl: 

THic     **IP.0**     IS    ,rCA-Fo    AT     riF    I  FF^    QF    THE    PRIMITIVE    REGION    *N'D21*I 

THrc     *pTcn**     HA<     THE     CTPMCTIJR&L     TIp    *D**OVF**     TO     THE     PP^MI-IVE 
r,Er,IPV    *N031**.     THIS     PF^.I^N    COULD    *F     A    HIDDEN    °ART    nF    twF    QRjECT. 


2£   =fi  nrKM     nr^FTPMnj    ff£   Jijlj    no-JEC 


fi       jvujt**    IS    FrijNp    IN    tmf   ^CENE 


tHts    *HAT**    IS    LOCATED    fPfW.F.    THE       *«&*;** 


TC    rF    THE    rrFr'E    A?E    &£    Efl]  l  nwS 


pND    hf.  ^  F I  A t  i  PN  *  L     nFSrP!"^I?N    £££.  J_HlJi    rPJECI» 


*        *mam**     If    FCUND    IN    "HE     SCENE. 


TTrN,c    ^r    TMF    pTH£D     papts.    OF    THF    SCENE    A°JE_  AS_   FGLJL£HI 

the     DP|  ATT  nN    *H^LD**     BFTWFEN    Tl-"E    *MAN**    -' ■  ND    "ME     PRIMITIVE 
RFHTHM    *ND?1**     ic    N;n^    KNJDWN    Tn    THE    ?ECOr*'T7Ec. 

-rnjc      *M£fv**      Tr     EFCATFD     PEI.HW    THE         *HAT**. 


;  ii7 

EUD  0£  PEi.ATinNAi,    DESCRIPTION  JEQE.  Thts   i?qjfct. 

END  CLE  1LLXE.  D£S££I£II£N. 


118 


5. 5   Other  Features  of  the  Recognizer 

The  recognizer  has  several  excellent  features  which  enables  it  to 
recognize  incomplete  objects,  different  views  of  the  objects  and  varieties 
of  the  same  object.   It  is  also  able  to  cope  with  the  preprocessors  which 
divide  scenes  into  three-dimensional  bodies.   We  will  discuss  these  in  the 
following  subsections. 

5.5.1   Recognition  of  incomplete  objects 

In  chapter  4  we  discussed  the  best  match  feature  of  our  recognizer. 
Here  we  give  an  example  of  scene,  Fig.  5.6,  where  one  of  the  legs  of  table 
and  chairs  and  a  dog's  foot  is  hidden  (or  unrecognizably  partially  hidden) 
from  the  view.   For  chairs  and  the  table  the  search  has  been  very  expensive 
in  time,  since  all  the  entry  points  implied  by  the  AP's  had  to  be  tried 
before  M6  was  picked  as  the  best  match.   As  for  the  dog,  the  head  was  recog- 
nized first,  and  the  body's  partial  match  with  FOM  (figure  of  merit)  5/6, 
which  is  greater  than  AFOM  (acceptable  figure  of  merit)  (.75),  was  recognized 
immediately.   The  recognizer  also  realized  that  region  30  has  a  structural 
tie  to  this  partially  matched  body  and  assumed  that  this  must  be  a  partially 
hidden  leg.   The  report  of  this  experiment  is  reproduced  in  the  following 
pages. 


119 


20-' 


15- 


-8 


•-13 


Fig.  5.6.   A  scene  where  objects  have  (partially 
hidden  unrecognizable  parts. 


120 
BRIEF    gFSf.RIPTIONi   ££  H±£    SCENE. 

iHE  fclllwING  crjects  h£R£  PARSSC  1&  IH£  Sf.ENF* 

1.  LOG 

2.  TABLC 

3.  CHAIR 

4.  ChAIR 

5.  CHAIR 

iNIi  fE  HR1EF  DFSCKIPTICN. 


121 
PULL    DESCRIPTICN   £t   U±E    SCENE. 

JJj£  FCLLCHI1SG   SUCCESSFUL   STEPS   WERE  taken  jj±  PARSING   Thf   sr.FNF. 
OBJECT    *CHAIR**. 

OBJECT   *MCDEL6**. 

REGION  *ND18**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *CHAIR**. 

OBJECT  *KCDEL6**. 

REGION  *NO<»**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *CHAIR**. 

OBJECT  *MCDEL6**. 

REGION  *NC15**  IS  THE  ATTENTION  PCINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *TABLE**. 

OBJECT  *MCDEL6**. 

REGICN  *NC10**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *DOG**. 

CBJECT  *DOG_H**. 

REGICN  *ND21**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

JJiiS  CQN.CLUCES  lh£   RS.PCRT  Lt   H±E  &li£££S^£LU  attempts. 


122 
Iht  Z&Z^L   £1£I££1AJl  1HEQ&UAILQU   XS  AS  fQLlflJiS: 


THE  PRIMITIVE  REGION  *ND3i**  IS  LEFT  WITHOUT  ANY  INTERPRETATION. 

THE  PRIMITIVE  REGION  *ND3i**  IS  MOST  LIKELY  A  HIDDEN  PART  OF  THE 
♦DOG**. 


Eton  QE  ZkLAIlLUL   £££££I£IIGU  fflfi  IJ±fcS  £££!£&• 

A   *CCG**  IS  FCUNC  IN  THE  SCENE. 

LISl   &£LAII£fc&  IG  ItiE  QlhEE   £££!£  fl£  L££  S££N£  AR£  AS  £GLLOk£: 

THIS  *DOG**  HAS  THE  STRUCTURAL  TIE  *DAttCVE**  TO  THE  PRIMITIVE 
REGION  *ND31**.  THIS  REGION  COULD  BE  A  HIDDEN  PART  OF  THE  OBJECT 

THIS  *DOG»*  IS  LOCATED  IN  FRONT  OF  THE   *TABLE**. 
thD.  QE   ££LAII£NAL  ££S£EI£IILfc  £0£  IBIS  £flJ££I- 

A   +TABLE**  IS  FOUND  IN  THE  SCENE. 

IIS  R£LAII0J^  Ifl  Iht   £IJd££  £A£IS  C£  Itt   S££N£  A££  AS  EQLLQHS'- 
THIS  *TABLE**  IS  LOCATED  IN  BEHINC  OF  THE   *COG**. 
THIS  *TA8LE**  IS  LOCATED  AT  THE  LEFT  OF  THE   *CHAIR**. 
THIS  *TABLE**  IS  LOCATED  IN  FRONT  OF  THE  SECOND   *CHAIR**. 
THIS  *TAbLt**  IS  LOCATED  AT  THE  RIGHT  OF  THE  THIRD    *CHAIR**. 

Eh£L   ££  &EL&I1QML   fi£S£Bl£U£&  £££  IhU  L&AELI- 

A   *CHAIR**  IS  FOUND  IN  THE  SCENE. 

US  ££LAII£N.£  Ifi   I££  £I££E  £AEIS  ££  It£  S££N£  AR£  AS  fflLLGJaS: 
THIS    *CHAIR**    IS    LOCATED    AT    THE    RIGHT    OF    THE      *TA8LE**. 

EUU   Q£  E£LAII£*AL  D£S££I£IIGfc  £D£  ItiIS  £BJ££I. 


123 
A   SECCNO      *CHAIR**    IS    FOUND    IN    THE    SCENE, 

IIS  fi£LAII£A£  1Q  Ifc£  LlhtZ  £A£I£  ££  Ifc£  ££££*£  A££  AS  EOLUUtf: 
THIS    *CHAIR**    IS    LOCATED    IN    BEHIND    OF    THE       *TABLE**. 

£M2  Q£  ££1AI1£UAL  QES££l£II£t>  £££  UdlS  ££J££I. 

A    THIRD         *CH/5IR**    IS    FCUND     IN    THE    SCENE. 

Hi  B£LAIl£kS  Ifl   ItJ£  OU£fi  £A£TS  fl£  Ib£  S££fcl£  AE£  AS  £QJJLmiS: 
THIS    *CHAIR**    IS    LOCATED    AT    THE    LtFT    CF    THE       +TABLE**. 

£M2  D£   ££1AI1£&AL  ££SC*1£II£1U  £Q£  IiJli  Q&1E£I« 
£M2  Q£    $LLXt  QESL&lEULb. 


124 

5.5.2   Scenes  with  varieties  of  the  same  object 

In  describing  the  graphical  models  of  the  objects  in  the  universe  we 
allowed  that  more  than  one  class  of  primitive  region  be  associated  with  a 
primitive  node.   In  testing  the  semantical  equivalence  of  primitive  nodes, 
all  these  alternatives  are  tried  and  a  single  equivalence  establishes  the 
match.   In  Fig.  5.7  we  have  shown  a  scene  with  different  varieties  of  the 
object  "KNIFE".   The  results  of  analyzing  this  scene  are  shown  in  the 
following  pages.   Of  course  it  would  be  trivial  to  add  the  actual  shape 
information  about  each  part  of  the  objects  to  this  report. 

We  can  also  easily  extend  this  idea  to  super  nodes  and  have  more  than 
one  object  (graphical  rule)  associated  with  each  super  node. 


125 


n 


li 


12    13 


14 


15 


Fig.  5.7.   A  scene  with  varieties  of  the  same  object 


126 
BBIEE  I2£££Ei£IJLQN  Q£  Itt£  £££&£• 

T.H£   EflLLQiilUG  Q£J££I£  K£&£  £A£££I2  IN  IJd£  SC£^£. 

1.  KNIFE 

2.  KNIFE 
3-  KNIFE 

4.  SPCQN 

5.  CUP 

6.  CARROT 

£ND  fl£  ££!££  ££££&1£IIC£J. 


127   ' 


bull  ntZL&izimu  at  mt  sleaz. 

OBJECT  *CARROT**. 

REGION  *N16**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *CUP**. 

REGION  *Ni***  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *SPOGN**. 

REGION  *N1I**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *KNIFE**. 

REGION  *N4**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *KNIFE**. 

REGION  *N9**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *KNIFE*+. 

REGION    *N1**    IS    THE    ATTENTION    POINT    OF    PARSING    THE    ABOVE    STEPS. 

IblS.  CilttkUJQfS  ib£  E££QRJ  0£  Id£  SiiC££S£f.UL  ATTEMPTS- 


128 
THE  £A&S£Q  £I£IQ£IAL  IUEQEMAIIQU  IS  AS  EDLLQUS: 

THE  PRIMITIVE  REGION  *N12**  IS  LEFT  WITHOUT  ANY  INTERPRETATION, 

THE  PRIMITIVE  REGION  *N12**  IS  MOST  LIKELY  A  HIDDEN  PART  Of  THE 
*CUP**. 

£UQ  Q£  RELATIONAL  D£SCai£IIDIil  £J3E  IdlS  EE6IQN. 

A       *KNIFE**    IS    FOUND    IN    THE    SCENE. 

IIS  &ELAULQUS  IQ  ItiE  QltlEE  EAEIS  fl£  ItiE  5C£N£  ARE  AS  EULLQiiS: 
THIS    *KNlFE**    IS    LOCATED    AT    THE    RIGHT    OF    THE    SECOND      *KNIFE**. 
THIS    *KNIFE**    IS    LOCATED    AT    THE    LEFT    OF    THE    THIRD         *KNIFE**. 

ENG  fl£  EELAIIGNAL  DESLRIELIflN,  £QR  IiJIS  Q2JELL 

A    SECOND      +KNIFE**    IS    FOUND    IN    THE    SCENE. 

IIS  RELAIIOHS  10  ItiE  QltiEfi  EARIS  fl£  ItiE  S£Efci£  ARE  AS  £QLLQMS: 
THIS    *KNIFE**    IS    LOCATED    AT    THE    LEFT    OF    THE      *KNIFE**. 

ENO  QE  RELAIICUAL  aESLRIRIIOU  £2R  IMIS  QaJEkL. 

A    THIRD         *KNIFE**    IS    FOUND    IN    THE    SCENE. 

IIS  REEAUQUS  IQ  IM£  QltiER  EARIS  Q£  IME  S££U£  ARE  AS  EOLLDMS: 
THIS    *KNIFE**    IS    LOCATED    AT    THE    RIGHT    OF    THE       *KNIFE**. 
THIS    *KNIFE**    IS    LOCATED    AT    THE    LEFT    OF    THE      *SPOON**. 

END  Qt  REEAIIONAL  CESL£I£IIflN  £DR  ItllS  Q£JL££I. 
A       *SPOON**    IS    FOUND    IN    THE    SCENE. 


129 

m  eelaiiqus  ia  imz  cito  ea&is  qe  ihe  illus.  aee  as  eqllqks: 

THIS  *SPOON**  IS  LOCATED  AT  THE  RIGHT  OF  THE  THIRD    »KNIFE**. 
THIS  *SPOON**  IS  LOCATED  AT  THE  RIGHT  OF  THE   *CUP**. 

ZUQ.   QE  EELAIIQUAL  QESCRlEIlOil  £fl&  IH.1S  Qai£LI. 

A   *CUP**  IS  FOUND  IN  THE  SCENE. 

IIS  EELAULQAIS  Ifl  Id£  QltiER  RARIS  QE  XEf£  SCENE  ARE  AS  EQLLQiiS: 

THIS  *CUP**  HAS  THE  STRUCTURAL  TIE  *INSIDE**  TO  THE  PRIMITIVE 
REGION  *N12**.  THIS  REGION  COULD  8E  A  HIDDEN  PART  OF  THE  OBJECT 

THIS  *CUP**  IS  LOCATED  AT  THE  LEFT  OF  THE   *SPOON**. 
THIS  *CUP**  IS  LOCATED  AT  THE  LEFT  OF  THE   *CARROT**. 

END  QE  RELAI1QNAL  Q£S£RI£IIQN  EQR  JJdlS  Q£J££I. 

A   *CARKOT**  IS  FOUND  IN  THE  SCENE. 

IIS  EELAIIQNS  IQ  Ih£  QlbER  EARLS  QE  Ib£  SCENE  ARE  AS  EQLLQHS: 
THIS  *CARRUT**  IS  LOCATED  AT  THE  RIGHT  OF    THE   *CUP**. 

ENG  Q£  RELAIIQNAL  D£S£E1£IIQN  EQR  ItUS  QflJ££I. 
EM2  QE  SCENE  QESEE1EIIQN. 


130 

5.5.3   Preprocessing  of  the  scene  graph 

The  nodes  in  graphical  rules  correspond  to  the  meaningful  parts  of 
the  objects.   These  nodes  normally  represent  the  regional  views  of  the 
parts.   In  actual  3-dimensional  scenes,  however,  there  are  cases  where  more 
than  one  regional  view  of  the  parts  are  visible.   So,  we  need  a  class  of 
rules,  which  operate  on  the  actual  scenes  and  conform  them  to  our  one-node- 
representation  of  parts.   This  is  accomplished  through  MERGING  of  the  nodes 
in  the  scene  graph. 

Guzman  [19]  has  been  pioneer  in  discovering  rules  to  divide  scenes  into 
3-dimensional  bodies.   Fig.  5.8(a)  shows  two  different  views  of  the  same 
"house"  where  several  regions  of  each  part  are  visible.   Using  simple  vertex 
properties  like  "T- joint"  and  "Y-joint",  we  can  find  the  collection  of 
regions  which  correspond  to  each  part.   Then,  we  can  merge  these  nodes  into 
a  node  equivalent  to  the  primitive  node  in  the  model  graph.   In  the  merging 
process  the  structural  relations  are  embedded  irredundantly ,  and  the  associa- 
tion of  the  newly  created  node  is  simply  the  union  of  associations  of  the 

merged  nodes. 

Since  the  parts  of  a  house  are  symmetric,  the  graphical  representation 

of  both  vies  will  be  the  same,  Fig.  5.8(b). 

The  results  of  parsing  this  scene  are  reproduced  in  the  following  pages, 
The  first  page  gives  the  merging  operations  which  are  performed  before  the 
recognizer  is  called. 

In  Appendix  B  we  have  represented  another  example  of  this  type,  where 
the  regions  of  a  "chair"  have  been  divided  into  several  subregions. 


131 


£ 

\ 

1      «-SfX 

\ 

\ 

4 

\ 

3 

5 

10 

9 

12 

11 

15 

16 

13 

14 

18 

17 

\ 

\ 

(a)   picture 


(b)   input  graph 


Fig.  j. 8.   An  example  of  a  house  with  complex  subparts 


132 


NODES  N2  AND  Nl  ARE  MERGED  INTO  ONE  NODE, NAMED  GENII 

NODES  N7  AND  N6  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN12 

NODES  N5  AND  N3  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN13 

NODES  NIO  AND  N9  ARE  MERGEO  INTO  ONE  NODE, NAMED  GEN14 

NODES  N12  AND  ND11  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN15 

NODES  N14  AND  Nil  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN16 

NODES  N16  AND  N15  -  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN17 

NODES  N18  AND  N17  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN18 

NODES  N4  AND  GENU  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN19 

NODES  N8  AND  GEN12  ARE  MERGED  INTO  ONE  NODE,NAMED  GEN20 

NODES  N13  AND  GEN15  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN21 

NODES  GEN18  AND  GEN17  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN22 

NODES  GEN16  AND  GEN21  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN23 


133 


ItiE  £flLLDaitt£  Q£J££IS  M£££  £Afi££fl  J.N  IM£  S££N£ 
1.    HOUSE 

£fclD  D£  E£l££  D£££al2lICU. 


134 


E12LL  DE££&i£IlDfl  Q£  IUf  i££U£. 

Xtl£  £QLLOMIN£  Su££££££ilL  £!£££  ME&£  IA*£N  IN  EAEilNfi  lH£  S£ftt£* 
OBJECT    *HOUSE**. 

OBJECT    *BUILDING**. 

REGION  *GEN13**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS, 

IH1S  £Qfcl£UJD££  Iii£  &£££&!  CLE  Itif  SjJ£LEJS££UJL  AH£U£IS. 


135 

ItiE  EARNED  £I£lQ£lAL  lHLQMAlim  IS  AS  £QLLQaS: 

A      *HOUSE**    IS    FOUND    IN    THE    SCENE. 

IIS  EELAIICNS  IQ  XMJE  QltiEE  EARIS  Q£  ItiE  SCENE  ARE  AS  £UJJLfl*S: 

£b&  0£  EEJLAIIDNAL  D£S££l£II£)U  £QE  IUIS  QflJ£CI. 

EtlQ  C£  SCENE  DESEEIEIIDU. 


136 


5.5.4  Recognition  of  the  different  views  of  an  object 

In  chapter  4  we  introduced  a  class  of  transformations   T ,  T , 

KyU    RioU 

and  T  „_,'  which  operate  on  the  graphical  models  of  the  objects  and  produce 
R270         r  or  r 

the  models  of  different  views  of  that  object.   In  representing  a  part  of 
an  object  as  a  primitive  node  in  the  graphical  rule,  we  include  all  differ- 
ent regional  views  of  that  part  tagged  with  angular  information  as  acceptable 
primitive  regions  for  that  primitive  node.   And,  besides  the  attributes  dis- 
cussed earlier  we  associate  a  "relative  size"  attribute  with  each  node  which 
gives  the  relative  size  of  that  part  to  the  other  parts  of  the  object. 

Now,  production  of  different  views  of  an  object  is  merely  the  problem 
of  applying  the  transformations  T  ,  T   or  T„  to  the  set  of  binary  relations 
in  the  graph  and  deleting  any  nodes  which  at  the  new  position  are  directly 
behind  a  node  of  greater  relative  size. 

Fig.  5.9  shows  four  different  views  of  a  bird.   In  Fig.  5.10  we  have 
shown  the  results  of  applying  these  transformations  to  the  model  of  bird's 
body.   Here  we  have  introduced  another  transformation  which  merely  deletes 
the  hidden  parts  from  the  original  model.   When  a  part  of  the  object  is  recog- 
nized we  also  will  have  the  orientation  information,  and  in  further  parsing 
we  will  use  this  information  to  transform  the  rules  before  any  matching  is 
attempted. 

In  the  following  pages  we  have  reproduced  the  result  of  analyzing  the 

scene  of  Fig.  5.9.   For  example  in  recognition  of  the  bird's  rear  end,  the 

body  is  first  recognized  with  180   orientation.   In  an  attempt  to  find  a  match 

for  the  bird  which  consists  of  body  and  head,  the  rule  is  first  transformed 

by  TDion'  and  the  super  node  corresponding  to  the  head  is  eliminated.   So  no 
Ri  o0 

further  search  for  the  head  will  be  necessary. 

It  is  also  trivial  to  produce  this  rotational  information  in  the  report. 


137 


Fie.  5.9.   Four  different  views  of  a  bird 


138 


M30: 


TRO: 


relative    size 


TR90: 


mNF^g) 


TR180 


TR270: 


@M^— 


Fig.  5.10.   Application  of  rotational  transformations  to 
rule  M30  (bird's  body). 


139 


fi£l££  Qfc££ai£IIQN  Q£  ItiE  £££U£. 

Itifc  £J2LL0MI£IG  iitU££X.S  M£E£  ££££££  Hi  lb£   S££N£. 

1.  BIRD 

2.  BIRO 

3.  BIRD 

4.  BIRD 

£NQ  D£  &£!££  JS£4L£l£IIQfl. 


140 


EiiLL  D£SX£1£I1QN  0£  Iti£  SX£U£. 

Iij£  EOLLQWING  5lJ£££SSEUL  £I£££  M£&£  IAK£U  IN  EA&SIUG  Iti£  £L£fcl£. 
REGION   *N11**    IS    THE    ATTENTION    POINT    OF    PARSING    THE   ABOVE    STEPS. 

OBJECT    *BIRD**. 

OBJECT    *BIRD_B**. 

REGION    *N25**    IS    THE    ATTENTION    POINT    OF    PARSING    THE    ABOVE    STEPS. 

OBJECT    *BIRD**. 

OBJECT    *BIRD_B**. 

REGION  *N15**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT    *8IRD**. 

OBJECT    *BIRD_B**. 

REGION  *N7**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT    *BIRD**. 

OBJECT    *BIRD_B**. 

REGION  *N2**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

IttIS  LQNi.LUa£^  Id£  R££D£I  fl£  Iti£  S1J£££SJS£UL  AU£fl£IS. 


141 
IH£  £A&££D  £I£IflRIAL  IttEQfiMAIlQfl  IS  AS  EQLLQH&* 

THE  PklMITIVE  REGION  *Nll**  IS  LEFT  WITHOUT  ANY  INTERPRETATION. 

THE  PRIMITIVE  REGION  *NU**  IS  MOST  LIKELY  A  HIDDEN  PART  OF  THE 
SECOND   *BIRD**. 

£hb   Q£  R£LAlIGUAL  G£SXRI£IIQN  £Q£  ItilS  &Z&1QU* 

THE  PRIMITIVE  REGION  *N29**  IS  LEFT  WITHOUT  ANY  INTERPRETATION. 

THE  PRIMITIVE  REGION  *N2<V**  IS  MOST  LIKELY  A  HIDDEN  PART  OF  THE 
FOURTH   *BIRD**. 

£U&  Q£  E£LAIIDNAL  B££LR1EIIQN  £QR  IULS   R£GlQN. 

A   *BIRD**  IS  FOUND  IN  THE  SCfcNE. 

IIS.  R£LALUU£  ID  Ib£  QId£R  £ARI£  D£  Itl£  SLLUt   AR£  AS   £JLLQki£: 
THIS  *BIRD**  IS  LOCATED  AT  THE  RIGHT  OF  THE  SECOND   *BIRD**. 

£UQ  Q£  R£LAII£NAL  £££LRI£IIflN  £OR  IhLS   Qfli££I. 

A  SECOND   *BIRD**  IS  FOUND  IN  THE  SCENE. 

US.  ££L  ALUMS  IU  JMt   OIti£fi  £ARI£  0£  Itit  5££N.£  AR£  AS  £QLLDMi: 

THIS  *BIRD**  HAS  THE  STRUCTURAL  TIE  *DABOVE**  TO  THE  PRIMITIVE 
REGION  *Nll**.  THIS  REGION  COULD  3E  A  HIDDEN  PART  OF  THE  OBJECT. 

THIS  *BIRD**  IS  LOCATED  AT  THE  LEFT  OF  THE   *3IRD**. 

THIS  *BIRD**  IS  LOCATED  ABOVE  THE  THIRD    *BIRD**. 

THIS  *BIRD**  IS  LOCATED  AT  THE  LEFT  OF  THE  FOURTH   *BIRD**. 

£NQ  Q£  RcLALLoLlAL  Q£SXRI£IIQ£j  EQR  IULS   QfiJ££I. 
A  THIRD    *6IP0**  IS  FOUND  IN  THE  SCENE. 


142 


IIS  E£LAIiaaS  IQ  Ib£  QltiEfi  EA&IS  0£  Itf£  S££tt£  A&£  AS  £QLLQMS: 
THIS  *8IRD**  IS  LOCATED  BELOW  THE  SECOND   *BIRD**. 

FNQ  QE  RELAIICUAL  ££S£EI£IIQU  ££ia  ItllS  QaiiCI. 

A  FOURTH   *BIRD**  IS  FOUND  IN  THE  SCENE. 

1U  aELALLQNS  IQ  IUE  fllti£&  EAaiS  Q£  ItiE  S££U£  Aa£  AS  EQLLQMS* 

THIS  *BIRD**  HAS  THE  STRUCTURAL  TIE  *DABOVE**  TO  THE  PRIMITIVE 
REGION  *N29**.  THIS  REGION  COULD  BE  A  HIDDEN  PART  OF  THE  OBJECT 

THIS    *BIRD**    IS    LOCATED   AT    THE    RIGHT    OF    THE    SECOND      *BIRD**. 
EUQ  Q£  a£LAII£NAL  DESLfilEIIflU  £OE  ItilS  0&I£CI« 
£Hn  fl£  £££fl£  D£SXai£LLQfci* 


143 
5. 6   Observations 

Because  of  a  great  many  contextual  factors  involved  in  the  recognition 
speed,  we  would  need  to  experiment  with  many  examples  to  produce  any  meaning- 
ful statistics . 

The   experiments   have  been  carried  out  using  360/75  and  our  imple- 
mentation of  SOL.   The  graph  generation  and  relation  preprocessing  are  quite 
fast  and  in  most  cases  took  less  than  one  second  in  execution  time.   Since 
our  SOL  execution  time  routines  operate  on  Pi/1  pointers  rather  than  offsets, 
we  had  to  read  the  graph  structure  of  our  universe  from  the  saved  area  for 
each  experiment.   This  takes  approximately  9  seconds. 

The  recognizer  was  written  in  SOL;  its  generated  p4/l  program  is  about 
950  statements  long.   The  subgraph  matching  routine  was  also  written  in  SOL 
and  its  generated  Pi/1  program  is  600  statements  long.   There  are  a  dozen 
other  SOL  programs  whose  algorithms  have  been  discussed  earlier,  and  they  are 
very  short  and  straightforward  programs,  provided  that  they  are  written  in  SOL. 

In  Table  5.3  we  have  shown  the  results  of  a  few  experiments,  some  of 
them  mentioned  earlier.   Slow-core  has  been  used  in  these  experiments. 

In  Fig.  5.11  we  have  depicted  the  recognition  time  of  the  scenes  via  the 
number  of  nodes  in  the  scene  graph.   Here  we  have  excluded  the  examples  which 
would  need  extra  time  to  use  special  features  of  the  recognizer.   For  example, 
experiment  7  would  involve  extensive  rotations.   With  a  fixed  universe,  this 
graph  should  be  linear,  but  here  it  is  slightly  curved  upward  because  the 
increased  number  of  nodes  will  slow  down  the  linear  list  searches. 

In  Fig.  5.12,  we  have  depicted  the  average  processing  time  for  each  object, 
against  the  average  number  of  nodes  per  object  of  the  scenes.   Here  again  we 
have  excluded  the  examples  which  have  used  the  semantic  association  to  reduce 
the  recognition  time  or  examples  with  extensive  rotations.   This  curve  is  also 
approximately  linear,  because  our  domain  search  algorith  discussed  in  chapter 
4  is  basically  linear. 


144 


Experiment 
ID 

#  of 
nodes 

#  of 
objects 

Go  step 
time (s) 

Recognition 
time (s) 

Figure 
ID 

1 

5 

1 

10.67 

1.69 

2 

7 

1 

13.57 

3.59 

3 

17 

6 

12.94 

4.08 

Fig.  5.7 

4* 

27 

9 

22.20 

13.32 

5 

30 

3 

17.35 

7.28 

Fig.  5.4 

6* 

30 

4 

36.69 

26.50 

Fig.  5.6 

7* 

32 

4 

23.19 

13.43 

Fig.  5.9 

8 

50 

9 

35.07 

21.40 

Fig.  5.2 

9 

70 

15 

39.72 

28.42 

Appendix  B 

Table    5.3.      Tabulated   results    of   experiments. 


NUMBERS     ARE     EXPERIMENT    ID'S 


I I 1 L 

20  30  40 

NUMBER      OF      NODES 


I i 1 L 

50        60 


70 


'ig.  5.11.   Speed  of  recognition  via  #  of  nodes  in  the  scene  graph. 


145 


CO 

a 

o 

LU 
CO 


Ld 

2 


CO 
V) 

UJ 

o 

o 

cc 

Q. 
LU 

< 
a: 

Ixl 

> 

< 


NUMBERS     ARE     EXPERIMENT     ID'S 


i 


-L 


_L 


345  6789  10 

AVERAGE       NUMBER     OF     NODES/OBJECT 


Fig.    5.12.      Average   processing   time    per   object    via   #    of   nodes/object 


146 


6.   LEARNING 

Another  important  factor  in  any  intelligent  system  is  the  ability  to 
learn.   One  aspect  of  artificial  intelligence  which  has  been  more  disturb- 
ing to  outsiders  than  to  insiders  has  been  the  apparently  small  degree  of 
"learning"  in  the  programs  designed  to  solve  the  problems.   These  programs 
do  not  learn  how  to  solve  the  problems;   the  methods  are  built-in.   Of 
course,  the  matter  is  relative  to  one's  goals:   to  make  a  machine  with 
intelligence  is  not  necessarily  to  make  a  machine  that  learns  to  be 
intelligent. 

Many  of  the  programs  which  people  have  not  considered  to  be  "learning" 
programs  have  an  enormous  "learning  potential"  just  below  the  surface.   Con- 
sider the  qualitative  effect  upon  the  subsequent  performance  of  Bobrow's 
STUDENT  [23]  of  telling  it  that  "distance  equals  speed  times  time"!   That 
one  experience  alone  enables  it  to  handle  a  large  new  portion  of  high  school 
algebra:   the  physical  position-velocity-time  problems.   It  is  important 
not  to  get  the  habit,  suggested  by  modern  work  in  psychology,  of  concentrat- 
ing only  on  the  kinds  of  "learning"  that  appear   as  show-improvement-attendant- 
upon-sickeningly-of ten-repeated  experience!   Bobrow's  program  does  not  have 
any  cautious  statistical  devices  that  have  to  be  told  something  over  and  over 
again,  so  its  learning  is  too  brilliant  to  be  called  so. 

Looking  at  it  this  way  should  clarify  why  we  have  come  to  feel  that 
questions  like  "Why  don't  you  put  some  learning  into  your  program?"  are  much 
less  sensible  and  straightforward  than  they  may  seem.   In  the  early  experi- 
ments in  cybernetics  the  program's  abstract  knowledge  was  very  small  and 
most  decisions  were  based  upon  the  values  of  simple  explicit  parameters. 
When  things  are  done  that  way  one  could  always  build  in  a  crude  sort  of 


147 

adaptive  behavior  by  using  any  of  a  number  of  correlation-like  "reinforce- 
ment" schemes.   There  is  no  reason  to  suppose  that  anything  like  this  is 
appropriate  for  "thinking".   In  thinking,  the  result  of  an  intellectual 
experience  is  used,  not  simply  to  adjust  a  parameter  but  to  construct  a  new 
way  to  represent  something,  or  even  to  make  a  change  in  an  administrative 
aspect  of  the  problem-solving  control  system.   Before  it  is  profitable  to 
attempt  this,  we  need  more  experience  with  the  systems  that  at  least  partially 
analyze  their  own  problem-solving  experiences.   Then,  we  will  probably  dis- 
cover the  general  underlying  principles  to  these  highly  intelligent  activi- 
ties.  One  should  not  expect,  however,  to  find  problem-solving  generality 
through  the  discovery  of  a  single,  magnif iciently  general  problem-solving 
method.   Even  humans  do  not  have  unfailing  success  in  new  areas;  the  acquisi- 
tion of  a  new  problem-solving  method  is  a  major  event  in  our  own  cultural 
evolution.   Our  approach  here  is  to  understand  how  the  people  can  learn  what 
they  are  told  and  to  model  these  processes  by  programming  the  computers  to 
achieve  the  same  goal. 

We  define  the  learning  ability  of  a  program  in  a  narrower  sense  as 
"performance  improvement  from  one  run  to  another".   Our  recognizer,  as  we 
stated  before,  functions  in  two  different  modes,  namely  learning  and  non- 
learning  modes. 

In  the  non-learning  mode,  the  saved  information  is  statistical  in  nature, 
and  can  affect  only  the  ordering  of  the  generated  lists  in  the  recognition 
process.   This  is  a  slow  learning  process  through  the  accumulation  of  the 
analyzed  results  of  the  scene  examples.   These  statistics  can  be  classified 
in  three  different  categories  as  follows: 

a.   There  is  a  set  of  pointers  (pointing  to  entry  points)  associated  with 
each  primitive  class.   By  assigning  a  frequency  attribute  to  each  of 


148 

these  pointers,  we  can  order  this  set  of  entry  points  according  to 
the  values  of  these  attributes.   At  each  successful  attempt  the  recog- 
nizer will  increase  the  value  of  this  attribute  for  all  the  pointers  in 
the  involved  primitive  classes  pointing  to  this  entry  point.   This  in 
effect  will  change  the  ordering  of  the  set  of  entry  points  implied  by 
each  primitive  class,  and  cause  the  more  frequently  occurring  entry 
points  to  be  tried  first.   This  will  improve  the  recognition  performance 
in  a  repetitious  environment. 

b.  We  can  also  have  these  frequency  attributes  associated  with  the  branches 
in  our  parsing  graph.   At  each  successful  attempt  the  value  of  this 
attribute  for  the  branch  which  led  us  to  this  successful  graphical  rule 
will  be  increased.   Now,  if  we  order  the  set  of  successors  according 

to  the  value  of  the  frequency  attribute  of  branch  leading  to  each 
successor,  we  will  be  trying  the  rules  in  the  sequence  that  more  fre- 
quently occurring  rules  are  tried  first.   This  will  also  improve  the 
recognition  performance. 

c.  One  of  the  useful  heuristics  in  our  recognition  algorithm  was  the 
semantic   association  between  the  objects  of  the  universe.   For  each 
object  a  set  of  pointers  pointing  to  the  objects  (graphical  rules), 
which  occur  commonly  in  natural  scenes  with  this  object,  are  associated 
with  its  graphical  representation.   We  can  have  the  frequency  attributes 
associated  with  these  pointers  and  increase  the  value  of  this  attribute 
for  those  pointers  which  led  us  to  objects  actually  occurred  in 
conjunction  with  this  object.   This  again  can  affect  the  order  in  which 
these  objects  will  be  tried,  so  that  the  system  will  adapt  itself  to 
the  environment. 


149 

We  see  that  even  in  non-Learning  mode   the  system  is  adaptive  and 
increases  its  recognition    for        frequently  occurring  scenes.   The 
amount  of  saved  information  in  this  mode  is  negligible  compared  to  that  in  the 
learning  mode. 

In  the  learning  mode  our  recognizer  is  able  to  modify  the  existing 
objects,  add  or  delete  objects  from  the  universe,  save  variations  of  an 
object  acquired  from  incomplete  matches  for  use  or  merger  in  future 
experiments.   In  the  following  subsections  we  discuss  these  functions  of  our 
recognizer . 

6 . 1   Addition  or  Deletion  of  Objects  from  the  Universe 

Our  universe  of  objects  (graph  structure)  can  be  easily  modified  to 
include  new  objects,  or  generalized  to  recognize  a  larger  class  of  the  same 
object,  or  add  restrictions  to  the  existing  models  to  exclude  near-miss 
examples  of  existing  objects. 

Introducing  a  new  object  to  our  universe  can  be  accomplished  in  several 
steps.   If  the  object  is  complex  it  should  be  learned  part  by  part.   In  the 
case  of  simple  objects  all  the  existing  rules  are  bound  to  fail,  so  the 
initial  scene  graph  is  saved  as  the  model  for  that  object.   In  Fig.  6.1(a) 
we  have  shown  a  fish  and  its  graphical  representation.   Conceptual  formation 
(modeling)  of  the  new  object  can  be  formalized  in  the  following  manner: 

a)  The  input  graph  is  parsed  normally,  except  that  the  backup  procedure 
would  not  be  invoked.   In  other  words,  for  all  parts  of  the  input  graph 
we  will  try  to  climb  in  the  parsing  graph  as  high  as  possible. 

b)  The  reduced  graph  is  given  a  distinct  rule  name  and  saved  in  the  graph 
structure. 

c)  A  node  is  created  in  the  parsing  graph.   A  set  of  out-going  branches 
are  created  between  this  node  and  the  nodes  representing  the  graphical 


(a)   fish 


150 


® 


(N2)  inside  -(nS)  next    m(m) 


© 


(b)   graphical  representation  of  fish 


(c)      girl 


{M33}  ^ 


{M32}     dl0F     4N4 


(d)   final  reduced  graph  for 
the  girl 


Fig.  6.1.   Addition  of  new  objects  to  the  universe, 


151 

rules  associated  with  the  nodes  of  the  newly  formed  graph, 
d)    Semantical  information  provided  by  the  teacher  is  saved  in  an  area 
pointed  to  by  this  new  rule  (name,  object  association,  etc.). 

This  will  enable  our  system  to  recognize  the  new  object  in  subsequent 
experiments.   In  Fig.  6.1(c)  we  have  shown  a  girl.   After  applying  our 
normal  parsing  procedure  we  have  identified  a  head  (M33),  two  hands  (M32's) 
and  two  feet  (Mil's).   The  final  graph  is  shown  in  Fig.  6.1(d). 

Another  learning  experiment  for  the  system  would  be  to  present  it  with 
an  object  and  tell  it  this  is  an  object  known  to  the  system  or  this  is  a 
near-miss  to  an  object.   In  both  cases  the  final  reduced  graph  of  the  object 
is  combined  with  the  existing  model.   In  the  first  case  the  combined  model 
should  encompass  (match)  the  final  reduced  graph.   In  the  near-miss  case  the 
model  should  be  modified  in  a  manner  which  does  not  match  the  reduced  (near- 
miss)  input  graph,  but  will  match  an  already  existing  model. 

Model  generalization  can  be  done  easily  by  adding  optional  nodes  and 
branches  to  the  graphical  rule  or  if  necessary  to  make  some  of  the  existing 
branches  optional  or  change  their  labels.   For  example,  if  a  region  was  known 
to  be  directly  above  another  region,  but  in  the  input  scene  it  happened  to 
be  directly  below  that  region,  the  branch  would  be  labeled  as  next-to. 
Restriction  can  be  easily  carried  out  by  adding  new  nodes  and  branches  or  make 
some  optional  elements  imperative,  or  relabel  some  of  the  branches. 

An  optional  element  in  a  graph  is  an  element  which  will  be  sought  in 
the  domain  matching,  but  its  non-existence  will  not  affect  the  outcome  of 
the  match.   For  example,  if  the  girl  in  6.1(c)  was  represented  as  an  example 
of  a  man,  two  optional  nodes  and  branches  will  be  added  to  the  man's  model 
(M34)  and  a  set  of  attribute  values  will  be  added  to  the  associated  values 


152 

of  the  primitive  node  representing  the  midsection,  to  make  the  region  4  an 
acceptable  match  to  this  primitive  node. 

6.2   Saving  the  Incomplete  Domains  of  the  Rules 

In  the  teaching  process  there  are  cases  when  we  want  the  system  to 
learn  from  the  examples,  but  we  do  not  provide  it  with  additional  information 
as  to  how  this  learning  should  take  place.   In  this  case  the  system  will 
proceed  blindly  and  finds  out  the  rule  which  has  the  best-match.   If  this 
best-match  is  acceptable  according  to  some  criterion  (AFOM,  etc.),  the  recog- 
nizer will  proceed  as  though  the  match  was  complete.   Now,  if  the  parsing 
was  successful  and  a  proper  object  was  recognized  in  the  scene,  the  domains 
of  these  best-matches  are  saved  as  variations  of  these  best-matched  graphical 
rules  as  follows: 

a)  The  domain  of  the  best  match  is  saved  in  a  graph. 

b)  A  new  node  is  created  in  the  parsing  graph  and  has  this  newly  formed 
graph  associated  with  it. 

c)  Create  a  branch  between  this  node  and  the  node  of  the  best-matched 
rule  (link)  indicating  that  this  new  node  represents  a  satellite  of 
that  rule. 

Now,  if  with  each  rule  we  try  its  satellites  as  alternatives,  we  can 
avoid  the  extensive  searching  process  for  these  known  best  matches.   Using 
some  appropriate  criterion  we  can  later  on  combine  any  of  these  satellites 
with  the  linked  graphical  rule  or  delete  them  from  our  graph  structure.   This 
experiment  was  carried  out  for  the  scene  of  Fig.  5.6,  and  recognition  time 
was  reduced  from  approximately  26  seconds  to  16  seconds. 


153 
7.   CONCLUSIONS  AND  SUGGESTIONS  FOR  FUTURE  WORK 

In  this  thesis  we  developed  a  general  methodology  for  analysis  and 
parsing  of  the  graph  representable  pictures.   This  methodology  is  general  in 
the  sense  that  as  long  as  the  entities  subject  to  analysis  can  be  represented 
in  graphs,  it  provides  a  valid  technique  of  analysis  for  these  entities.   We 
have  shown  the  application  of  this  technique  to  picture  analysis  throughout 
this  thesis  and  in  particular  we  have  demonstrated  the  results  of  its  appli- 
cation to  a  simple  class  of  pictures.   Here  we  show  its  generality  by  listing 
the  involved  steps  in  any  application: 

1.  Define  the  class  of  entities. 

2.  Define  the  primitives  (atoms)  of  this  class,  which  represent  the 
primitive  nodes  in  our  graph  representation. 

2.1   Define  a  set  of  primitive  node  attributes  whose  values  will  identify 
these  atoms. 

3.  Define  a  class  of  relations  which  can  be  used  to  represent  the  context- 
ual relationships  of  these  atoms  as  well  as  this  class  of  entities. 

3.1   Investigate  the  embedding  properties  of  these  relations,  and  define 
procedures  to  embed  branches  whenever  it  is  required  in  connection 
with  transformations. 

4.  Develop  preprocessing  techniques  which  can  transform  the  natural 
occurrence  of  these  entities  to  this  graphical  form. 

5.  Define  the  graph  structure  which  represents  the  entities  in  the  universe. 

6.  Our  defined  recognizer  and  domain-search  algorithms  can  be  applied 
directly  . 

As  we  have  seen  in  chapter  6,  our  system  has  considerable  learning 
capability.   If  the  user  chooses  not  to  define  the  graph  structure,  he  can 


154 

let  the  system  build  its  own  graph  structure  through  experience  and 
learning. 

In  defining  the  graph  structure  of  a  class  of  3-dimensional  objects  we 
can  choose  either  of  the  following  two  methods: 

1.  Define  a  graphical  model  for  each  3-dimensional  part  of  the  objects.   In 
this  graphical  representation,  there  should  be  enough  information  to 
enable  us  to  construct  the  graphical  representation  of  this  part  at  all 
different  projecting  angles. 

2.  A  useful  alternative  is  the  case  where  we  allow  a  limited  number  of 
projecting  angles.   In  this  case  we  can  have  a  single  node  in  the  model, 
which  represents  all  visible  regions  tagged  with  the  angles  at  which 
they  are  visible. 

For  some  class  of  pictures  (Guzman  (1968))  it  has  been  shown  that  there 
are  a  set  of  general  rules,  which  can  act  on  the  input  graph  and  discover 
the  cluster  of  regions  which  correspond  to  these  3-dimensional  parts.   In 
this  research  we  have  chosen  the  second  approach,  and  in  establishing  the 
semantic  equivalence,  simply  all  the  regions  visible  at  any  angle  should 
have  a  corresponding  node  in  the  scene  graph. 

7.1   Parallel  Processing 

Another  area  open  for  investigation  is  parallel  graph  processing 
in  connection  with  graph  structure.   To  achieve  this  parallel  processing  the 
control  structure  of  our  SOL  procedures  must  be  much  more  flexible  than  those 
of  PJ£/1.   Our  recognition  system,  in  addition  to  moving  up  and  down 


155 

hierarchical  trees  by  initiating  and  terminating  execution  of  access 
modules,  should  be  able  to  wander  among  the  access  modules  by  suspending 
and  resuming  their  executions  in  any  order,  unconstrained  by  the  tree 
structure  of  their  inherent  control  relationships.   For  example  when  two 
different  copies  of  the  domain-search  procedure  are  working  on  two  different 
parts  of  the  graph  it  should  be  possible  to  suspend  one  of  them  from  inside 
the  other  one.   In  [24]  Bobrow  and  Wegbreit  (1973b)  define  a  general  model 
for  control  that  has  been  used  as  the  basis  of  implementations  for  such 
languages.   This  model  defines  the  set  of  information  or  frame  that  must  be 
associated  with  every  activation  of  an  access  module  to  make  possible  its 
suspension  and  reactivation  in  a  meaningful  way.   The  SOL  graphs  and 
associations  can  be  used  to  represent  this  type  of  control  structures. 

7 . 2   Occluded  Objects 

In  our  described  method  the  partial  matching  technique  is  useful  in 
recognition  of  reasonably  visible  objects.   Open  for  investigation  here  is 
the  discovery  of  some  deductive  procedures  which  can  fill  in  the  occluded 
parts  after  the  recognized  object  is  removed.   For  example,  in  Fig.  7.1(a) 
after  the  "bird"  is  recognized  and  removed,  a  continuity  process  should  be 
able  to  deduce  and  fill  in  the  hidden  parts  of  the  legs.   In  our  current 
implementation,  the  "bird"  and  "man"  are  both  recognized,  but  the  lower 
halves  of  the  feet  will  probably  be  recognized  as  two  "hammers".   In  Fig. 
7.1(b)  we  have  shown  the  input  graph  after  the  "man"  and  "bird"  have   been 
recognized . 

In  recognition  of  different  projections  of  the  objects,  we  require 
procedures  which  can  project  the  parts  of  an  object  at  any  angle  and  match  it 
against  the  part  of  scene  which  is  supposed  to  be  the  projection  of  this 
part  at  this  angle. 


156 


>Mke 


Fig.  7.1.   Recognition  of  occluded  objects 


157 

7 .3   Relational  Files 

In  our  implementation,  the  knowledge  about  the  universe  (graph  structure) 
is  saved  exactly  in  the  same  structural  form  that  it  will  be  used  in  infer- 
ences.  As  the  knowledge  grows  we  will  encounter  the  problem  of  partitioning 
this  data  set  and  managing  the  file.   It  would  be  desirable  to  have  a  common 
data  base  which  can  be  referenced  by  our  system  as  well  as  other  artificial 
intelligence  and  problem  solving  systems.   Associative  memories  like  the 
one  used  in  SAIL  [25J  look  promising.   For  example 

color  X  X  =  red 
can  be  used  to  extract  all  the  primitive  nodes  whose  their  "color"  attri- 
bute has  the  value  "red".   The  set  of  primitives  found  from  the  above  search 
can  be  used  in 

father  x  X  =  Y, 
to  find  Y,  the  set  of  successors  to  each  primitive  node.   Graphs  can  be 
defined  in  the  same  manner  by  defining  structural  relations  between  nodes. 
For  example, 

node  X  Gl  =  X, 
will  locate  all  the  nodes  of  graph  Gl .   Since  the  graphical  representations 
are  needed  for  domain  search  and  other  algorithms,  we  must  also  define  the 
interface  procedures  which  extract  this  information  from  the  data  base  and 
construct  the  graphs  as  they  are  needed. 

We  also  consider  the  implementation  of  an  interactive  SOL  imperative 
for  further  research. 


158 
LIST  OF  REFERENCES 

[l]   Uhr,  L.   (Ed.),  Pattern  Recognition.  New  York,  Wiley,  1966. 

[2]  Michalski,  R.  S.,  "A  Variable  Valued  Logic  System  as  Applied  to  Pic- 
ture Description",  Proceedings  of  the  IFIP,  May  22-26,  1972. 

[3]  Maruyama,  K. ,  "A  Study  of  Visual  Shape  Perception",  Ph.D.  Thesis, 
Department  of  Computer  Science,  University  of  Illinois,  1972. 

[4]   Jayaramamurthy,  S.  N. ,  "Computer  Methods  for  Analysis  and  Synthesis  of 
Visual  Texture",  to  be  reported  as  a  Ph.D.  Thesis,  Department  of  Com- 
puter Science,  University  of  Illinois,  1973. 

[5]   Strong,  J.  P.  and  Rosenfeld,  A.,  "Automatic  Cloud  Cover  Mapping", 

Ph.D.  Thesis,  Computer  Science  Center,  University  of  Maryland,  1971. 

[6]   Guzman,  A.,  "Analysis  of  Curved  Line  Drawings  Using  Context  and  Global 
Information",  in  Machine  Intelligence,  edited  by  Meltzer,  B.  and 
Michie,  D.  (1971),  pp.  325-375. 


[7]   Herzog,  B. ,  "Lectures  on  Computer  Graphics",  Computer  and  Pro 


gram 


Organization-Fundamentals ,  University  of  Michigan  Engineering  Summer 
Conferences,  June,  1967. 

[8]   Kursland,  H.  E.,  "A  General  Purpose  Graphic  Language",  Communications 
of  the  ACM,  Vol.  11,  No.  4,  April,  1968,  pp.  247-254.    ~~   "    " 

[9]   Schwebel,  J.  C,  "Towards  the  Specification  of  a  New  Image  Processing 

Language",  DCS  File  No.  788,  University  of  Illinois  at  Urbana-Champaign, 
February,  1969. 

[10]   Chase,  S.  M.  ,  "Analysis  of  Algorithms  for  Finding  Ml  Spanning  Trees 
of  a  Graph",  DCS  Report  No.  401,  University  of  Illinois  at  Urbana- 
Champaign,  1970. 

[ll]   Pratt,  T.  W.  and  Friedman,  D.  P.,  "A  Language  Extension  to  Graph 

Processing  and  Its  Formal  Semantics",  Communications  of  the  ACM,  Vol. 
14,  No.  7,  July,  1971,  pp.  460-467. 

[12J   Earley,  J.,  "Toward  an  Understanding  of  Data  Structures",  Communica- 
tions of  the  ACM,  Vol.  14,  p.  617,  1971.  — -~~ 

[13]  Lieberman,  R.  N. ,  "RSVP  Relational  Structure  Vertex  Processor",  Tech. 
Report  No.  69-87,  Computer  Science  Center,  University  of  Maryland, 
March,  1969. 

[14]  Wolfberg,  M.  S.,  "An  Interactive  Graph  Theory  System",  Report  No.  69-25, 
Moore  School  of  EE,  University  of  Pennsylvania,  June,  1969. 

[15]   Crespi-Reghizzi,  S.  and  Morpurgo,  R. ,  "A  Language  for  Treating  Graphs", 
Communications  of  the  ACM,  Vol.  13,  No.  5,  May,  1970,  pp.  319-323. 


159 

[16]  Pflantz,  J.  L. ,  "Web  Grammars  and  Picture  Description",  Computer 
Graphics  and  Image  Processing  (1972),  pp.  193,  220. 

[17]   Schwebel,  J.  C. ,  "A  Graph-structure  Transformation  Model  for  Picture 
Parsing1',  Ph.D.  Thesis,  Department  of  Computer  Science,  University  of 
Illinois  at  Urbana-Champaign,  1972. 

[l8]   Shaw,  A.  C,  "Parsing  of  Graph  Representable  Pictures",  Journal  of 
the  ACM,  Vol.  17,  No.  3,  July,  1970,  pp.  453-481. 

[19]   Guzman,  A.,  "Decomposition  of  a  Visual  Scene  into  Three-dimensional 
Bodies",  AFIPS  Fall  Joint  Computer  Conference,  Vol.  33,  1968,  pp. 
291-304. 

[20]  Eastman,  C.  M. ,  "Explorations  of  the  Cognitive  Processes  in  Design", 
Computer  Science  Department,  Carnegie  Mellon  University,  February, 
1968. 

[2l]  Pflantz,  J.  L.  and  Rosenfeld,  A.,  "Web  Grammars",  Proceedings  Inter- 
national Joint  Conference  on  Artificial  Intelligence ,  May,  1969,  pp. 
609-619. 

[22]   McCormick,  B.  H.  and  Schwebel,  J.  C,  "Use  of  Graph  Transformations 
to  Characterize  an  Image:   An  Illustrative  Example",  File  No.  770, 
Department  of  Computer  Science,  University  of  Illinois  at  Urbana- 
Champaign,  July,  1968. 

[23]   Bobrow,  D.  G. ,  "Natural  Language  Input  for  a  Computer  Problem  Solving 
System,"  Semantic  Information  Processing,  MIT  Press,  pp.  135-215. 

[24]   Bobrow,  D.  G.  and  Wegbreit,  B.,  "A  Model  for  Control  Structures  for 

Artificial  Intelligence  Programming  Languages",  Proceedings  of  IJCAI, 
Stanford,  California,  1973. 

[25]   Feldman,  J.  A.,  et  al . ,  "Recent  Developments  in  SAIL",  An  ALGOL-based 
language  for  artificial  intelligence,  FJCC,  1972. 

[26]  Rosenfeld,  A.,  "Picture  Processing",  Computer  Graphics  and  Image 
Processing,  1972,  pp.  394,  416. 

[27]  Advani,  J.  G.,  "Computer  Recognition  of  Three  Dimensional  Objects  from 
Optical  Images",  Ph.D.  Thesis,  Ohio  State  University,  1972. 

[28]  Gaffney,  J.  L.,  Jr.,  "TACOS :   A  Table  Driven  Compiler  System", 
DCS  Report  No.  325,  University  of  Illinois  at  Urbana-Champaign. 

[29]   Bobrow,  D.  G. ,  "New  Programming  Languages  for  AI  Research",  Third  Inter- 
national Joint  Conference  on  Artificial  Intelligence,  Stanford 
University. 

[30]   Preparata,  F.  P.  and  Ray,  S.  R. ,  "An  Approach  to  Artificial  Non-Symbolic 
Cognition",  Information  Sciences,  4  (January,  1972),  pp.  65,  86. 


160 

[3l]   Buneman,  0.  P.,  "A  Grammar  for  the  Topological  Analysis  of  Plane 

Figures",  Machine  Intelligence  6,ed.  by  Meltzer  and  Michie,  Edinburgh 
University  Press,  1970. 

[32]  McCormick,  B.  H.  ,  "Experiments  with  an  Image  Processing  Computer11, 
University  of  Illinois,  DCS  File  No.  406,  June,  1970. 

[33]   Stanton,  R.  B. ,  "Plane  Regions:   A  Study  in  Graphical  Communication", 
DEC,  University  of  South  Wales,  pp.  151,  193. 

[34]  Zahn,  C.  T. ,  "Graph-theoretical  Methods  for  Detecting  and  Describing 

Gestalt  clusters".  IEEE  Transactions  on  Computers,  SIAC-PUB-672 ,  Nov., 
-  1969. 

[35]   Schwebel,  J.  C,  "Graph  Transformations  for  Composite  Formations",  DCS 
Report  No.  368,  University  of  Illinois,  December,  1969. 

[36]  McCormick,  B.  H.  and  Schwebel,  J.  C,  "Consistent  Properties  of  Com- 
posite Formation  under  a  Binary  Relation",  DCS  Report  No.  348,  Univer- 
sity of  Illinois,  August  1969. 

[37]  Love,  H.  H.,  Jr.  and  Savitt,  D.  A.  and  Troop,  J.  R.  E. ,  "ASP:   A  New 
Concept  in  Language  and  Machine  Organization",  SJCC,  1967. 

[38]  Michalski,  R.  S.,  "A  Geometrical  Model  for  the  Synthesis  of  Internal 
Covers",  DCS  Report  No.  461,  June,  1971,  University  of  Illinois. 

[39]  McCormick,  B.  H.  and  Michalski,  R.  S.,  "Interval  Generalization  of 
Switching  Theory",  University  of  Illinois,  DCS  Report  No.  442,  May, 
1971. 

Uo]   Shaw  A.  C,  "On  the  Interactive  Generation  and  Interpretation  of 

Artificial  Pictures",  SLAC-PUB-664,  Cornell  University,  Ithaca,  New  York. 

[41]   Smith,  D.  N.,  "GPL/l-APL/1  Extention  for  Computer  Graphics",  SJCC,  1971, 
pp.  511-528. 

[42]  Anderson,  J.,  Balke,  K.  G. ,  and  Earnest,  C.  P.,  "Analysis  of  Graphs 

by  Ordering  of  Nodes",  Journal  of  the  ACM,  Vol.  19,  No.  1,  January,  1972. 

[43]   Firscheing,  0.  and  Fischler,  M.  A.,  "Describing  and  Abstracting  Pic- 
torial Structures",  Pattern  Recognitions,  1971,  pp.  421,  44J. 

[44]  Winogrud,    "Cognitive  Psychology",  Vol.  3,  No.  1,  whole  issue. 

[45]   Fu,  K.  S.  and  Swain,  P.  H. ,  "On  Syntactic  Pattern  Recognition", 
Software  Engineering,  Academic  Press,  1971. 

[46]   Evans,  T.  G. ,  "Grammatical  Inference  Techniques  in  Pattern  Analysis", 
Software  Engineering,  Academic  Press,  1971. 

[47]   Rosenfeld,  A.  and  Strong,  J.  P.,  "A  Grammar  for  Maps",  Software 
Engineering,  Academic  Press,  1971. 


w 


c 


161 

[48]  Minsky,  M. ,  "Semantic  Information  Processing",  The  MIT  Press   1968 
whole  book.  '      ' 

[49]  Pavlidis,  T.,  "Linear  and  Context-free  Graph  Grammars",  Journal  of 
the.  ACM,  Vol.  19,  No.  1,  January,  1972,  pp.  11,  22. 

[50]   Winston,  P.  H. ,  "Learning  Structural  Descriptions  from  Examples" 
MAC-TR-76,  Project  MAC,  MIT,  September,  1970. 

[51]   Narasimhan,  R. ,  "On  the  Description,  Generation,  and  Recognition  of 
Classes  of  Pictures",  Automatic  Interpretation  and  Classification  of 
Images,  New  York  and  London:   Academic  Press,  1969. 

[52]   Clowes,  M.  B. ,  "Transformational  Grammars  and  the  Organization  of 

Pictures",  Automatic  Interpretation  and  Classification  of  Images   Ne 
York  and  London:   Academic  Press,  1969. 

[53]  Evans,  T.  G. ,  "Descriptive  Pattern  Analysis  Techniques",  Automati 
Interpretation  and  Classification  of  Images,  New  York  and  London- 
Academic  Press,  1969. 

[54]   Berkowitz,  S.,  "GIRL-Graph  Information  Retrieval  Language-Design  of 
Syntax",  Software  Engineering.  Vol.  2,  Academic  Press,  1971. 

[55j   Dodd,  G.  G.,  "APL--A  Language  for  Associative  Data  Handling  in  Pi/1" 
FJCC  (1969),  pp.  677-684.  8      '      ' 

[56]   Knowlton,  K.  C. ,  "A  Programmer's  Description  of  L6"  ACM   Vol   9 
1966,  pp.  616-625.  ' '     '   ' 

[57]   Rovner,  P.  D. ,  "An  AMBIT/ G  Programming  Language  Implementation"   MIT 
June,  1968.  '     ' 

[58]   Feldman,  J.  A.  and  Rovner,  P.  D. ,  "The  Leap  Language  and  Data  Structure" 
Information  Processing,  pp.  579-585.  ' 

[59]   Pavlidis,  T.,  "Structural  Pattern  Recognition:   Primitives  and  Juxta- 
position Relations",  Princeton  University,  January,  1971. 

[60j   Nagy,  G.,  "State  of  the  Art  in  Pattern  Recognition",  Proceedings  of  the 
IEEE,  Vol.  56,  1968,  pp.  836,862.  S ~ 

[61]   Levine,  M.  D. ,  "Feature  Extraction:   A  Survey",  Proceedings  of  the  IEEE 
Vol.  56,  1968,  pp.  836-862. ' 

[62]  Falk,  G.,  Feldman,  J.  A.,  and  Paul,  R. ,  "The  Computer  Representation 
of  Simply  Described  Scenes",  in  M.  Faiman  and  J.  Nievergelt  (Eds  ) 
Pertinent  Concepts  in  Computer  Graphics.  University  of  Illinois  Press 
1969,  pp.  87,  103.  ' 

[63]  Anderson,  R.  H. ,  "Syntax  Directed  Recognition  of  Handprinted  Two- 
dimensional  Mathematics",  Ph.D.  Thesis,  Harvard  University,  January, 


162 


[64]  Feder   J.   "Linguistic  Specification  and  Analysis  of  Classes  of  Line 
Patterns"'  School  of  Engineering,  EE  Department,  New  York  University, 
April,  1969. 

[65]  Feder  J.,  "The  Linguistic  Approach  to  Pattern  Analysis:  A  Survey", 
Report  No.  400-133,  School  of  Engineering,  EE  Department,  New  York 
University,  February,  1966. 

[66]   Kirsch  R.  A.,  "Computer  Interpretation  of  English  Text  and  Picture 

Patterns",  EC-13,  No.  4,  IEEE  Trans,  on  Electronic  Computers,  August, 
1964,  pp. '363,  376. 

[67]  Knoke,  P.  J.  and  Wiley,  R.  C. ,  "A  Linguistic  Approach  to  Mechanical 
Pattern  Recognition",  Proceedings  of  the  IEEE  Computer  Conference, 
September,  1967,  pp.  142-144. 

[68]  Barrow,  H.  G.  and  Popplestone,  R.  J.,  "Relational  Description  in  Picture 
Processing",  in  Meltzer,  B.  andMichie,  D.  (Eds.),  Machine  Intelligence  6, 
Edinburgh  University  Press,  1971,  pp.  325,  375. 

[69]  Eastman,  C.  M. ,  "Explorations  of  the  Cognitive  Processes  in  Design" 

Computer  Science  Department,  Carnegie-Mellon  University,  February,  1968. 

[70]  Eastman,  C.  M. ,  "Representations  for  Space  Planning",  Communications 
of  the  ACM,  Vol.  13,  No.  4,  April,  1970,  pp.  242,  250. 

[71]  Montanari,  G.  U. ,  "Networks  of  Constraints:   Fundamental  Properties  and 
Applications  to  Picture  Processing",  Department  of  Computer  Science, 
Carnegie-Mellon  University,  January,  1971. 

[72]  Duda,  R.  0.  and  Hart,  P.  E. ,  "A  Survey  of  Pattern  Classification  and 
Scene  Analysis",  Stanford  Research  Institute,  January,  1971. 

[73]  Huffman,  D.  A.,  "Impossible  Objects  as  Nonsense  Sentences",  in  Meltzer, 
B.  and  Michie,  D.  (Eds.),  Machine  Intelligence  6,  Edinburg  University 
Press,  1971. 

[74]   Pavlidis,  T.,  "Computer  Recognition  of  Figures  through  Decompositions", 
Information  and  Control,  Vol.  13,  1968,  pp.  526,  537. 

[75]   Pavlidis,  T.,  "Computer  Analysis  of  Figures  into  Primary  Convex  Sub- 
sets", ?j?22Z±   QJ  th£  I£^  ^-SC  Conference,  cat.  no.  68CZJ-bbL,  i?oo, 
pp.  55-60. 

[76]   Salton  G.  and  Sussengath,  E.  H. ,  "Some  Flexible  Information  Retrieval 
L     System  Using  Structure  Matching  Procedures",  AFIPS   Proceedings  of  the 
Spring.  Joint  Computer  Conference,  1964,  pp.  587,  597. 

[77]  Matula  D.  W. ,  "Cluster  Analysis  via  Graph  Theoretic  Techniques"   in 
L7?J  Zlln     R.  C.   Reid,  K.  B.  (Eds.),  P^ocee^s  o^  ^  |2gtaSa  £2* " 
ference  -  2l2J±  Theory  Combinatorics  and  Computing,  1970,  pp.  LW, 


163 


[78]  Zahn,  C.  T. ,  "Graph-theoretical  Methods  for  Detecting  and  Describing 
Gestalt  Clusters",  IEEE  Transactions  on  Computers.  Vol.  C-20   No   1 
January,  1971,  pp.  68,  86.  '    '   ' 

[79]   Pflantz,  J.  L.,  "Convexity  in  Graphs",  TR-68-74,  Computer  Science 
Center,  University  of  Maryland,  July,  1968. 

[80]  Montanari,  G.  U. ,  "Separable  Graphs,  Planar  Graphs,  and  Web  Grammars" 
Information  and  Control.  Vol.  16,  No.  3,  May,  1970,  pp.  243,  267. 

[81]   Evans,  T.  G. ,  "A  Heuristic  Program  to  Solve  Geometric-Analogy  Problems" 
AFIPS>  Proceedings  o_f  the  Spring  Joint  Computer  Conference   1964 
pp.  327,  328.  ' 

[82]   Siklossy,  L.,  "Generalized  Means-Ends  Analysis  and  Artificial  Intelli- 
gence", Information  Sciences.  Vol.  3,  1971,  pp.  149,  158. 

[83]   Holden,  A.  D.  C.  and  Johnson,  D.  L. ,  "The  Use  of  Embedded  Patterns  and 
Canonical  Forms  in  a  Self-Improving  Problem  Solver",  Proceedings  of  the 
ACM  National  Meeting.  1967,  pp.  211,  219.  

[84]  Winston,  P.  H. ,  "A  Heuristic  Program  that  Constructs  Decision  Trees" 
AI  Memo  173,  Project  MAC,  MIT,  March,  1969. 


164 


APPENDIX  A 
SOL 


Various  programming  languages  are  specialized  in  the  sense  that  the  set 
of  unique  facilities  provided  by  a  language  makes  some  types  of  programs 
easier  to  write  in  that  language  than  in  any  other.   Indeed,  the  main  reason 
for  introducing  new  features  into  a  programming  language  is  to  automate  pro- 
cedures that  the  user  needs  and  would  otherwise  have  to  code  explicitly; 
such  features  reduce  the  housekeeping  details  that  distract  the  user  from 
the  algorithms  in  which  he  is  really  interested.   The  structure  operation 
language  (SOL)  has  been  designed  and  implemented  to  facilitate  the  represen- 
tation and  transformation  of  graph  structures  and  associated  necessary  opera- 
tions in  the  field  of  artificial  intelligence,  specifically  in  picture 
processing.   SOL  consists  of  statements  called  graph  structure  statements 
which  are  embedded  in  the  procedural  language  pi/1.   SOL  statements  consist 
of  declarations,  associations  and  operations. 

Declarations  declare  the  basic  structural  elements.   Associations  declare 
and  associate  any  pX/1  data  type  (excluding  based  and  controlled  variables) 
with  the  basic  structural  elements.   Operations  are  performed  on  the 
declared  structural  elements,  and  are  capable  of  dynamically  creating  or 
deleting  elements. 

Basic  Structural  Elements 

There  are  six  basic  structural  elements  which  are  designated  by  the 
attributes  in  the  declarations  or  implied  by  the  class  of  operations  which 
create  them  dynamically.   These  elements  are  pointer,  set,  node,  branch, 
subgraph  and  graph. 


165 

A  pointer  points  to  or  designates  any  other  basic  element.   A  pointer 
can  designate  a  pointer,  set,  node,  branch,  subgraph  or  graph.   Thus,  a 
pointer  allows  indirect  reference  to  any  element.   SOL  pointers  are  useful 
in  forming  a  variable  set  of  elements;  in  other  words,  although  the  elements 
of  the  set  are  the  same  set  of  pointers,  we  can  effectively  change  the  set 
by  reassigning  different  values  to  these  pointers. 

A  set  designates  a  set  of  basic  elements.   These  elements  can  include 
SOL  pointers,  which  in  turn  can  point  to  any  other  basic  structural  elements. 

Nodes,  branches  and  subgraphs  are  elements  of  graphs.   A  node  designates 
a  set  of  branches  which  are  called  adjacent  branches  of  the  node.   A  branch 
designates  a  pair  of  nodes.   An  oriented  branch  designates  an  ordered  pair 
of  nodes,  the  tail  and  the  head  of  the  branch.   An  unoriented  branch 
designates  an  unordered  pair  of  nodes. 

A  graph  is  a  set  of  nodes,  branches  and  subgraphs  with  the  property  that 
if  a  branch  belongs  to  a  graph  then  its  tail  and  head  also  belong  to  the 
graph.   Graphs  will  be  assumed  to  have  oriented  branches  unless  specifically 
declared  otherwise. 

A  subgraph  is  a  collection  of  nodes  and  branches  such  that  if  a  branch 
belongs  to  the  subgraph  its  head  and  tail  must  also  belong  to  the  subgraph. 
A  collection  of  disjoined  subgraphs  is  such  that  no  node  is  shared  in  common 
with  other  subgraphs  of  the  collection. 

Declarations 

The  DECLARE  statement  is  used  to  declare  the  six  basic  structural  ele- 
ments.  An  element  is  declared  by  an  identifier  followed  by  an  attribute 
which  specifies  the  type  of  that  element. 

Declaring  an  element  creates  a  new  data  element  with  the  name  given  by 
the  identifier.   A  level  hierarchy  is  assumed  within  a  declaration  such  that 


166 

any  graph  element  declared  is  assumed  to  belong  to  the  last  declared  ele- 
ment (graph  or  subgraph)  at  a  lower  numbered  level.   Nodes  and  subgraphs 
in  the  scope  of  a  disjoint  subgraph  (or  graph)  are  assumed  to  have  a  unique 
name.   The  repetition  of  the  same  name  in  the  same  scope  is  interpreted  as  a 
reference  to  the  same  element.   The  default  value  for  graph  level  is  1,  and 
it  is  2  for  all  other  elements  of  the  graph.   A  branch  is  always  identified 
by  the  end  nodes,  so  the  declaration  of  a  branch  with  the  same  tail  and  head 
and  the  same  name  is  interpreted  as  a  reference  to  an  existing  branch. 

The  attribute  of  a  branch  is  of  the  form  BR(Nl,N2),  where  Nl  and  N2  are 
names  of  tail  and  head  of  the  branch  respectively.   This  declaration  will 
also  declare  the  head  and  tail  as  nodes  if  they  have  not  already  been 
declared.   Fig.  A.l  shows  some  examples  of  declarations.   The  default  value 
for  the  subgraph  is  non-dis joint . 

Names  and  References  to  Elements 

Names  are  not  required  for  nodes  and  branches  even  though  they  are 
explicitly  required  in  the  DECLARE  statement.   Elsewhere  in  the  SOL  language, 
operations  which  dynamically  modify  a  structure  can  result  in  the  creation 
of  a  new  unnamed  element.   Basically  names  need  not  be  unique,  since  the 
basis  of  references  are  pointers  pointing  to  different  elements  of  the  graph. 
Elements  with  non-unique  names  may  be  referenced  uniquely,  for  instance  by 
their  position  relative  to  other  elements. 

By  a  reference  to  an  element,  we  mean  a  unique  way  of  referring  to  that 
element.   An  element  can  be  referenced  by  its  name  when  the  name  is  unique 
within  the  current  scope  or  context.   In  Fig.  A.l  example  2,  nodes  with  the 
name  N2  can  be  referenced  uniquely  by  N2  and  H.N2  respectively.   Reference 
by  name  is  important  in  the  process  of  declaration.   If  we  wish  two  nodes  or 


167 


DCL 


Gl   GR5 

2    Nl    ND, 

2    BL    BR(N1,N2), 

2    HI    SG, 

3    B2    BR(N2,N3), 

2    H2    SG, 

3  B2  BR(N2,N3), 
3  B3  BR(N2,N4), 
3    B4    BR(N3,N4); 


example    1 


example   2 


B3 


>H 


Nl 


DCL 


G2    GR, 

Bl  BR(N1,N3), 

Bl  BR(N2,N1), 

B2  BR(N2,N3), 

2    H  SG   DISJOINT 

3  B3    BR(N2,N1), 

2    B2  BR(N3,H.N2); 


KLg.  A.l.   Declaration  examples 


168 

subgraphs  to  have  the  same  name  in  the  same  scope  we  can  given  them  tem- 
porary names,  and  change  these  names  in  subsequent  operations.   Since 
branches,  in  addition  to  name,  are  identified  with  their  tail  and  head 
nodes,  they  need  not  to  have  unique  labels. 

An  element  can  always  be  referenced  by  a  pointer,  since  a  pointer  always 
points  to  only  one  element.   Names  for  graphs,  pointers  and  sets  are  always 
references,  i.e.  they  are  unique  in  the  current  scope.   Subgraphs,  nodes  and 
branches,  however,  may  have "nonunique  names.   In  this  case,  they  may  be 
referenced  by  a  pointer,  by  a  qualified  name,  or  by  other  contextual  means. 
The  nondis joint  subgraphs  need  not  be  used  to  qualify  their  elements. 

General  format 

declaration::  =  {DECLARE  |DCL}[integer]  <*I  >  attribute 

{,  [integer]  <*I>  sttribute}*  ; 
attribute::  =  {GRAPH |GR  }[UNORIENTED] | 

{subgraph | sg}[dis joint] I 

{NODE | ND} | 

{BRANCH | BR } ({ <*I > | node-re f }, { <*I > |node-ref }) | 

{set  |st}  | 
{pointer | pt}  . 

Associations 


An  ASSOCIATE  statement  allows  pi/1  variables  (excluding  controlled  and 
based  variables)  to  be  declared  and  associated  with  the  specified  set  of  basic 
structural  elements  of  a  given  type.   If  the  set  is  not  specified,  it  will 
be  associated  with  all  the  elements  of  that  type.   The  FREE  statement  nulli- 
fies the  association.   Thus,  we  can  think  of  an  association  as  function, 
which  takes  elements  as  its  argument,  and  returns  the  value  of  the  associated 
variable . 


169 

The  list  of  element  references  specifies  the  elements  (or  the  set  of 
elements)  to  be  associated  with  or  freed  from  the  variables  which  follow. 
Examples : 

BRASOC    LENGTH   BIN   FIXED; 

NDASOC    (Gl.ND3,NOF(G2))    ARRAY (10) DEC    FIXED, 

ANGEL    BIN    FIXED, 
TEXT    CHAR(30);. 
The  first  example  will  associate  a  variable  "LENGTH"  of  type  FIXED 
BINARY  with  all  the  branches  of  all  the  known  graphs  in  the  current  scope. 
The  second  statement  will  associate  an  array  of  10  elements  of  type  DEC 
FIXED,  a  variable  "ANGEL"  of  type  FIXED  BINARY,  and  a  character  string 
"TEXT"  30  characters  long,  with  node  "ND3"  of  graph  "Gl"  and  all  the  nodes 
of  graph  "G2". 

Since  the  based  variables  are  used  to  implement  associations,  all 
restrictions  to  based  variables  apply  to  the  associated  variables  in  SOL. 
The  value  of  a  variable  associated  with  an  element  is  accessed  by  using  the 
variable  name  followed  by  an  element  reference  enclosed  in  parentheses. 
These  associated  variables  can  be  used  as  pseudo-names  to  assign  values. 
Examples : 

LENGTH (G1.B1), 

LENGTH  (PI),   /*P1   pointer  to  a  branch  */ 
ARRAY(G1.ND3)(5), 

ANGEL (TAIL (P2)),   /*P2  pointer  to  a  branch  in  graph  G2*/ 
syntax; 

association::  =  associate  |  free 

associate::  =  {  PTASOC |STAS0C j NDASOC | BRASOC |SGAS0C |GRAS0C  ] 

[ (element-ref { , element-ref  }*) Jp^l-declaration-tail ; 


170 


examples 


pj&l-declaration-tail::  =  p4l-element-dcA{,pil-element-dc4  }* 
f r ee  :  :  =  {PTFREE | STFREE | NDFREE | BRFREE | SGFREE | GRFREE } 
[ (element-ref { ,element-ref }*) ] <*I >{ , <*I > }*  ; 

GRASOC(Gl)    1   A, 

2    B   BIN   FIXED, 

2   C   CHAR    (8)  ; 
GRFREE (Gl)   A;  . 


Data  Operations 

Data  operations  dynamically  add  or  delete  elements  to  or  from  pre- 
viously declared  graphs,  subgraphs  and  sets.  A  reference  to  the  element 
being  modified  appears  after  the  operation  name.   The  identifier  and  attri- 
butes (or  context)  which  follow  refer  to  new  elements  to  be  added  or  elements 

to  be  deleted. 

Deleting  an  element  from  a  set  does  not  destroy  the  element  but  only 
deletes  the  reference  to  the  element  contained  in  the  set.   Similarly, 
deleting  a  node  or  branch  from  a  non-disjoint  subgraph  will  not  destroy  the 
node  or  branch,  since  it  will  still  be  a  member  of  a  graph  or  a  disjoint 
subgraph  at  a  lower  level.   Deleting  an  element  from  a  graph  or  a  disjoint 
subgraph,  however,  will  delete  the  element  from  existence. 

Examples : 

ADD  NODE  N5  TO  GR  Gl ;  /*adds  node  N5  to  graph  Gl*/ 

DEL  NODE  N5  FROM  Gl.H;  /^deletes  node  N5  from  subgraph  H*/ 

DEL  PI  FROM  SET  SI;  /^deletes  pointer  Pi  from  set  SI*/ 

ADD  SUBGRAPH  SGI  TO  GR  G2   /*adds  subgraph  SGI  to  graph  G2*/ 

ADD  BR  NEXT  (N1,N2)  TO  G2;  .• 


171 

/*  adds  a  branch  with  label  "NEXT"  between  nodes  Nl  and  N3  of  graph  G2*/. 
Deleting  a  node,  a  branch  or  a  subgraph  will  also  delete  it  from  the  sets 
referring  to  this  element. 
Syntax: 

data-operation::  =  add-operation [delete-operation 

add-operation: :  =  ADD  element-ref  TO   element-ref; 

delete-operation::  =  {DEL JDELETE  }  element-ref   FROM  element-ref; 
We  will  define  element-ref  later.   Some  combinations  of  the  two  element- 
ref s  appearing  in  data-operation  are  not  acceptable. 

Loop-Control 

The  loop-control  statement  allows  the  iteration  of  the  statements 
between  "FOR"  statement  and  corresponding  "END"  statement  for  all  the  ele- 
ments of  the  set  specified  in  this  statement.   For  (i,v,S)  specifies  that 
each  iteration  will  have  a  different  value  of  the  variable  v  chosen  from  the 
set  S.   The  variable  i  specifies  the  number  of  iterations  to  be  performed. 
"ANY"  is  equivalent  to  1  and  "ALL"  is  equivalent  to  the  cardinality  of  the 
set  S. 

Changing  the  set  S  within  the  loop  can  change  the  number  of  iterations 
performed.  For  example,  if  an  element  x  is  deleted  from  s  before  x  is  exe- 
cuted as  the  value  of  the  loop-control  variable  v,  then  x  will  not  be  used. 
Example: 

FOR  (ALL,NP,NOF(G));- • -END;  . 
The  loop-control  is  a  very  useful  statement. 
Syntax: 

loop-control::  =  FOR  ([ANY  |  ALL | integer  },  pointer-ref , set-ref ) ;  |END;  . 


172 

Element -operations 

These  operations  operate  on  a  single  or  a  pair  of  elements  and  return  a 
value  depending  on  the  nature  of  the  operation. 
Sets-set 

Sets-set  are  binary  operations  on  sets  which  return  a  set.   The  returned 
set  is  the  UNION,  INTERSECTION,  DIFFERENCE,  or  SYMMETRIC  DIFFERENCE  of  the 
given  sets.   If  the ' two  arguments  are  respectively  S^,  S2  then, 

UNION (S  ,S2)  =■   S1US2  OR 

INTER (S,S2)  ~  s1ns2  AND 

DIF   (S^.S^)  =  ;S'1-S2  Sin^2 

symdif(s1,s2)  =  sx©s2    sLo  s^ys^n  s2   . 


Example 


S  =  UNION (S15BOF(G)); 


which  assigns  to  set  S  the  union  of  set  S±   and  set  of  all  branches  in  graph 

G. 
Syntax : 

sets-set::  =  set-set-mnemonic  (set-ref , set-ref ) 

set-set-mnemonic::  =  UNION | INTER | DIF |SYMDIF  . 

Sets-Boolean 

Sets-boolean  returns  a  boolean  value  corresponding  to  the  truth  or 

falsity  of  the  statement; 

"S   equals  S2", 

"S      is   a    subset    of   S", 
Syntax : 

sets-boolean::    =    [EQUALS JSUBSET }(set-ref, set-ref )    . 


173 

Graph-set 

The  graph-set  operates  on  a  graph  and  returns  the  set  of  nodes  or 
branches  of  the  graph. 
Example : 

S  =  NOF(Gl)   /*  S  is  the  set  of  all  nodes  in  graph  Gl  */  . 
Syntax: 

graph-set::  =  {nOFJBOf}  ( {graph-ref Jsubgraph-ref  }) 

graph-ref::  =  [GRAPH |gr]<*I>| pointer -re f  . 

Node-set 

These  operations  operate  on  a  node  and  return  the  set  of  incoming 
(INC6R),  outgoing  (OUTBR)  or  adjacent  (ADJBR)  branches  of  the  node, 
respectively . 
Example : 

S  =  ADJBR(GL.NL); 
Syntax: 

node-set::  =  {ADJBR |lNCBR |0UTBR } (node-ref ) 

node-ref::  =  branch-node  |  [NODE | ND]  qualified-name |pointer-ref  . 

Branch-node 

The  HEAD  and  TAIL  functions  operate  on  a  branch  and  return  the  head  and 

tail  of  the  branch  respectively. 

Example: 

P2 
PI  =  HEAD(P2)  ;        O X) 

Syntax: 

branch-node::  =  {hEAd|tAILJ  (branch-ref) 

branch-ref::   =  [BRANCH |br] <*I >(node-ref, node-ref ) | [BRANCH | BR] 

qualified-name |pointer-ref  . 


174 


Set-integer 

The  CARD  function  operates  on  a  set  and  returns  an  integer  value 
which  indicates  the  cardinality  of  the  set. 

Example : 

N  =  CARD(NOF(G)); 

N  =  CARD (UNION (S15S2)); 

syntax: 

set-integer::  =  CARD  (set-ref)  . 

Set-ref 

Set  reference  is  one  of  the  following:   sets-set,  node-set,  graph-set 

or  a  name-reference  (P-G/l  pointer). 

Syntax: 

set-ref::  =  sets-set [node-set  |graph-set | [SET |st]  (*I >  . 

NAME  and  TYPE 

The  "NAME"  function  returns  the  name  of  the  referenced  element.   The 
"TYPE"  function  returns  the  type  of  the  element  referenced.   The  types  are 
npTii   mst"   "ND",  "BR",  "SG",  and  "GR". 

The  ■INAME,I  and  "TYPE"  functions  can  also  be  used  as  pseudo  variables  to 
alter  the  name  and  type  of  the  elements  respectively. 

Syntax: 

element-string::  =  [NAME |TYPE }(element-ref )  . 
The  following  is  a  summary  syntax  of  element-operations: 

element-operations::  =  sets-set  |boolean-set  |graph-set  |node-set 

|branch-node | set-integer |element-string  . 

Important 

When  we  wish  to  use  a  SOL  operation  in  a  Pi/1  statement  an  @  should  be 
added  to  the  name  of  the  operation  and  the  pA/1  pointer  used  as  element 


175 

references.   It  is  obvious  that  this  P^/l  pointer  should  be  initialized  to 
the  proper  value  before  it  is  used. 

Element-ref 

An  element  reference  refers  to  any  of  the  six  basic  elements  in  SOL. 
Syntax: 

element-ref::  =  pointer-ref  |  set-ref |node-ref |branch-ref 

J  subgraph-re f | graph -ref 
pointer-ref::  =  [POINTER |pt] <*I > 

subgraph-ref : :  =  [SUBGRAPH |sg]  qualified-name |pointer-ref. 
Others  have  already  been  defined. 

Pointer -ope rat ions 

Pointer-operations  change  the  value  of  a  pointer.   They  are  primarily 
useful  for  moving  a  pointer  to  adjacent  nodes  and  branches  of  the  current 
element  pointed  to,  or  move  the  pointer  to  any  other  element  of  the  graph 
which  has  the  same  type.   This  is  also  necessary  when  we  have  to  use  con- 
textual information  for  reference. 

The  "MOVE"  operation  moves  the  pointer  to  an  adjacent  branch  if  it 
points  to  a  node,  or  to  an  adjacent  node  if  it  points  to  a  branch.   If  the 
node  (branch)  which  we  are  moving  to  is  specified  by  name  or  reference,  the 
pointer  can  only  be  moved  to  a  node  (branch)  with  that  name  or  reference. 
If  the  named  node  (branch)  does  not  exist  as  an  adjacent  node  (branch),  the 
pointer  is  not  changed. 

The  "OMOVE"  or  oriented  move  operation  is  similar  to  "MOVE"  except  that 
moves  will  be  made  along  branches  only  in  the  tail-to-head  direction. 

The  "JUMP"  operation  moves  the  pointer  to  any  node  or  branch  on  the 
graph.   "JUMP"  will  only  move  from  a  node  to  a  node  or  from  a  branch  to  a 
branch,  so  that  the  type  of  element  pointed  to  is  not  changed.   If  no 


176 


reference  to  the  new  element  is  present,  an  arbitrarily  picked  node  (branch) 
is  used.   If  a  reference  by  name  exists  and  no  node  (branch)  with  that  name 
can  be  found,  then  the  pointer  remains  unchanged. 

The  "M0VE2"  operation  is  similar  to  two  consecutive  moves,  so  that  a 
node  pointer  is  moved  to  an  adjacent  node  or  a  branch  pointer  is  moved  to  an 
adjacent  branch.   "M0VE2"  is  not  equivalent  to  two  consecutive  "MOVEs",  since 
the  operation  will  be  performed  completely  or  not  at  all.   Thus  for  example 
M0VE2  PTR  L  LEFT-OF  L  CUBE;"  will  move  a  pointer  named  PTR  along  a  branch 
(from  the  current  node)  named  "LEFT-OF"  to  a  node  named  "CUBE",  only  if  both 
the  branch  and  the  node  exist.   However,  the  sequence  MOVE  PTR  L  LEFT-OF; 
MOVE  PTR  L  CUBE;  can  result  in  only  moving  the  pointer  to  a  branch  named 
"LEFT-OF".   If  "CUBE"  was  not  specified,  PTR  is  moved  via  branch  "LEFT-OF1' 
to  an  adjacent  node.   If  "LEFT-OF"  was  not  also  specified,  then  PTR  is  moved 
via  any  branch  to  an  adjacent  node.   If  the  specified  conditions  are  not 
satisfied  the  pointer  remains  unchanged. 

The  "0M0VE2"  is  like  M0VE2,  except  the  fact  that  moves  are  made  along 
the  branches  from  tail-to-head  direction  only. 

Assume  a  pointer  named  "BUG"  is  currently  pointing  to  a  node.   Then 

the  operations: 

MOVE  BUG  L  Bl;  /*  moves  BUG  to  an  adjacent  branch  named  "Bl"*/ 

MOVE  BUG;  /*  moves  BUG  to  an  adjacent  branch  */ 

M0VE2  BUG;  /*  moves  BUG  to  an  adjacent  node  */ 

M0VE2  BUG  L  ABOVE;  /*  moves  BUG  along  a  branch  named  "ABOVE"  to 

an  adjacent  node  */ 
0M0VE2  BUG  L  HIGHER  PT  PTR;/*  will  move  the  BUG  along  an  outgoing 

branch  named  "HIGHER"  to  a  node  referenced  by  pointer 

PTR*/  . 


name 


177 

Syntax: 

pointer-operation::  =  {JUMP |MOVE JOMOVE  }pointer-ref [ {branch-ref 

jnode-ref}  | label -name] 

| [M0VE2|0M0VE |  }pointer-ref[ {branch-ref | label 
[  {node-ref | label-name  }] j {node-ref | label-name } 
[{branch-ref | label-name}]]; 
label-name::  =  {iABEL  |l  }  <*I  >  . 

Higher-level  graph  operations 

Node-operations  and  subgraph-operations  deal  with  a  single  node  and  a 
collection  of  nodes  in  the  graph,  respectively. 

Node  operations 

These  operations  are  generative.   PARTITION  can  operate  on  a  node  and 
partition  it  into  a  set  of  nodes  with  branches  among  them.   Normally  a 
graph  or  subgraph  reference  will  tell  us  how  the  new  elements  are  to  be 
generated.   The  creation  of  branches  between  the  nodes  of  this  created  sub- 
graph and  other  nodes  of  the  graph  are  application  dependent  and  must  be 
specified  in  a  user  prepared  SOL  program.   This  procedure  can  be  referenced 
by  name  in  the  partition  statement. 

The  operation  GENERATE  is  like  PARTITION,  but  it  preserves  the  par- 
titioned node  and  maintains  a  link  between  the  node  and  generated  subgraph. 
Example : 

PARTITION   PTR1  INTO  PTR2   AS  PR0C1;  . 
Here  PTR1  is  pointer  reference  to  a  node,  and  PTR2  points  to  a  graph 
or  a  subgraph,  while  procedure  "PR0C1"  dictates  the  policy  of  how  this  (graph 
or  subgraph)  should  be  placed  in  the  original  graph. 


178 


Syntax: 

node-operation::  =  [PARTITION [GENERATE jnode -re f[lNTO] 

[graph-ref  |subgraph-ref  }  AS  (*I>;  . 

Subgraph -operations 

The  MERGE  operation  is  the  inverse  of  PARTITION,  and  it  reduces  (merges) 
a  collection  of  nodes  and  the  branches  between  them  (a  subgraph)  into  a 
single  node.   The  new  node  may  be  named  or  may  be  pointed  to  by  assigning 
the  result  of  the  merge  (new  node)  to  a  pointer.   In  our  implementation  a 
pointer  to  the  new  node  is  always  assigned  to  the  pi/1  pointer  »$ELPTR0". 

Since  parts  of  the  merge  operation  are  application  dependent,  the  user 
should  provide  his  own  procedure  to  handle  these  case  dependent  parts. 

Following  actions  occur  when  nodes  XI  are  merged  into  a  node  X: 

1.  A  new  node,  X,  is  created. 

2.  The  user  specified  procedure  will  embed  the  necessary  branches  and  make 
necessary  associations  with  the  newly  formed  node. 

3.  All  nodes  of  the  subgraph  and  all  branches  adjacent  to  these  nodes  are 

deleted. 
The  PARSE  operation  is  similar  to  the  MERGE  operation,  except  that  the 
merged  subgraph  is  not  deleted,  but  only  the  type  of  nodes  and  their  adjacent 
branches  in  this  subgraph  are  changed  to  make  them  temporarily  inactive. 
Again  a  user  provided  procedure  would  be  necessary  to  embed  necessary  branches 
and  process  associations.   A  list  of  these  merged  nodes  is  kept  as  the  memory 
of  the  new  node.   Preserving  this  information  makes  the  parse  (transformation) 

reversible . 

The  BACKUP  operation  reverses  the  actions  taken  by  parse  operation  and 
the  environment  is  restored  to  the  state  before  the  parse  was  applied. 


179 

Example : 

BACKUP    PT   PTR; 
will  restore  the  graph  to  the  state  before  the  parse  created  the  node 
pointed  to  by  PTR.   If  the  memory  of  this  node  was  null  nothing  would 
happen. 

The  DISCONNECT  operation  disconnects  all  branches  between  any  node  out- 
side and  any  node  inside  the  set  of  nodes  specified.   If  a  label  is  specified 
only  branches  with  that  name  are  deleted. 

The  SINK  and  SOURCE  operate  like  DISCONNECT,  except  that  SINK  does  not 
disconnect  branches  directed  into  the  node  set,  and  SOURCE  does  not  dis- 
connect branches  directed  out  of  the  node  set. 

The  operation  LINK  links  all  nodes  into  an  arbitrary  ordered  chain 
by  branches  having  the  specified  name  or  no  name. 
Examples : 

MERGE   NOF(SG  Gl.H);  /*  reduce  all  the  nodes  in  subgraph  H  into  a 

single  node  */ 
PARSE    SET    SET2  ;  /*PARSE  node  set  "SET2"*/ 

SOURCE  ST  SET1   L   CUT;  /*  cut  all  branches  with  label  "CUT"  which  are 

directed  to  any  node  in  set  SET1  */  . 
Syntax: 

subgraph-operation::  =  BACKUP  node-ref j {MERGE | PARSE | 

DISCONNECT j SINK | SOURCE i LINK] 
set-ref[ I  label-name  jbranch-ref  }][AS  <*I  )]; 
some  of  these  operations  are  shown  in  Fig.  A. 2. 

Other  statements 

There  are  a  score  of  other  statements  (IF-statement ,  GO  TO,  procedure 
CALL,  assignment,   etc.)  which  have  been  implemented  in  this  first  compiler 


180 


MERGE 


PARTITION 


V 


DISCONNECT 


-X    PARSE 


GENERATE 


^^>R(B3) 


Fig.    A. 2.      Graphical   representation  of   some   higher   level   operations. 


181 

for  SOL,  to  make  it  a  reasonably  self-contained  language.   Most  of  these 
statements  have  the  same  syntactical  and  semantical  interpretation  as  their 
corresponding  ones  in  P-&/1.   Here  we  discuss  only  assignment  statements. 

Assignments 

The  assignments  are  mainly  used  to  assign  the  values  to  the  variables 
associated  with  the  SOL  elements  and  restore  these  values.   They  are  also 
used  to  initialize  sets  and  assign  the  results  of  element-operations  in  SOL 
to  P-&/1  variables. 
Examples : 

LENGTH(PT  PTR)  =  3.5;  /*PTR  is  a  pointer  to  a  branch  */ 

PTR1  =  NODE  G1.ND1 

ANGEL(PTRl)  (2)  =  90;  /*ANGEL  is  an  array  variable  associated  with 
node  ND1  of  graph  Gl  */ 

PTR2  =  ADJBR(TAIL(G1.BR1)); 

Si  =  (PTR1,  NODE  G1.N3,PTR3);  /"initializes  SI  to  the  set  of  3  nodes  */. 

There  is  another  type  of  assignment  which  becomes  useful  when  the 
hierarchical  structure  of  a  graph  is  too  deep.   This  can  facilitate  the 
programming  by  allowing  defined  variables  to  be  used  in  place  of  qualified- 
name  to  refer  to  an  element  in  the  graph. 

Example:   Assume  the  qualified  name  for  a  node  is  Gl .HI .H2 . H3 .ND1 ,  and  the 
elements  in  subgraph  H„  are  used  several  times  in  the  program, 

#AUX  =  G1.H1.H2.H3; 
makes  the  qualified  name  as  simple  as  #AUX.ND1  and  can  be  used  with  all  other 
elements  in  subgraph  H~  . 
Syntax: 

assignment::  =  set-assignment |equ-assignment |assoc-assignment 
|  nor -assignment 


182 


set-assignment::    =  set-ref  =    (element-ref {5element-ref  }*) ; 
equ-assignment::    =  # <*I >  =  qualified-name; 
•assoc-assignment::    =  qualified-name    (element-ref) 

[  (P  J&1  -  index  [,PJ&1-  index}*)]   = 
[qualified-name  (element-ref  )[(pil-index{,P4L-index}*)] 
|pil-expression  j; 
nor-assignment::    =    <*I  >L(p4l-index[,p4l-index}*)  ]  = 

[element-operation [qualified-name  (element-ref) 
[(pj&l-index{,PJ£l-index}*)3|type   qualified-name  |p  ^-expression}; 

type::  =  [POinter|pt}|[set1st}|[node|nd}|[branch|br} 

|[subgraphIsg}|  [graph  |gr}  . 

Some  remarks  on  implementation  ■ 

The  implementation  of  the  data-structure  for  SOL  will  be  most  efficient 
in  storage  space,  while  meeting  the  requirements  for  dynamic  addition  and 
deletion  of  arbitrary  elements,  if  the  representation  is  a  list  structure 
corresponding  topological^  to  the  actual  element  structure. 

Figure  A. 3  shows  the  current  implementation  for  the  basic  structural 
elements  of  SOL.   Each  element  has  a  unique  entry  in  the  Element  Table  (ETAB). 
The  entry  has  a  name  field  (8  characters),  a  type  field  (2  characters),  and 
4  pointer  fields.   Only  two  of  these  pointer  fields  are  shown  in  this  figure. 
The  first  pointer  points  to  the  list  which  defines  data  structures  associated 
with  this  element.   The  second  pointer  will  always  point  (possibly  through 
a  chain  of  pointers)  back  to  elements  in  the  ETAB.   Since  nested  sets  are 
allowed,  the  type  field  is  necessary  to  determine  the  terminal  elements 
(nodes,  branches  and  subgraphs)  of  a  graph  or  a  set.   The  third  pointer  (not 
shown  in  the  figure)  is  used  to  point  back  to  the  element  which  has  this 


Element  Table  (ETAB) 


183 


name 


GR 


node  set 


branch  set 


SG  set 


SG 


node  set  - 


branch  set 


SG  set 


pointers 


ST 


ptr  to  elem 


Sample  entries 


type 


graph 


subgraph 


set 


pointer 


branch 


node 


pointer 


pointer 


*•  node  ptr 


node  ptr 


GR 

SG 
ST 


tail 


head 


branch  ptr 


branch  ptr 


branch  ptr 


SG  ptr 


SG  ptr 


Fig.  A. 3.   SOL  data-struct 


ure 


184 


element  as  one  of  its  constituents.   The  fourth  and  last  pointer  is  used  to 
save  the  memory  of  this  element  (nodes  and  branches). 

The  sample  entries  in  the  figure  show  the  pointer  structure  for  each  of 
the  element  types.   A  pointer  element  simply  points  to  another  element  in 
ETAB.   A  set  element  points  to  a  pointer. list  which  in  turn  points  to  a  set 
of  elements  in  ETAB.   Graph  and  subgraph  elements  point  to  an  array  of  three 
pointers  which  point  to  pointer  lists  designating  the  node-set,  branch  set, 
and  subgraph  set  of  the  graph  (or  subgraph).   A  node  element  contains  a 
pointer  to  a  pointer  list  designating  the  set  of  incident  branches  of  the 
node.   A  branch  element  points  to  an  array  of  two  pointers  which  point  to  the 
tail  and  the  head  of  the  branch.   Deletion  of  element  is  accomplished  by 
changing  the  type  field  to  "GB".   Garbage  collection  is  performed  by 
periodically  updating  all  sets,  subgraphs,  and  graphs  by  deleting  references 
to  the  elements  of  type  "GB-  and  actually  free  the  space  occupied  by  these 
elements  and  their  structural  definitions.   Since  the  sets  are  explicitly 
freed,  when  they  are  of  no  further  use,  our  garbage  collection  routine  is 
mainly  concerned  with  cleaning  up  the  graphs. 

Since  the  list  processing  in  pi/1  is  performed  using  based  variables,  we 
had  to  use  these  variables  to  perform  all  the  list  processing  algorithms  for 
SOL.   Wherever  possible  it  is  recommended  that  the  type  of  the  elements 
involved  in  the  search  be  specified  to  avoid  unnecessary  time  consuming  linear 
searches.   We  have  used  a  unique  entry  structure  for  all  different  types  of 
elements,  which  will  leave  some  fields  unused  for  certain  types  of  elements. 
The  language  was  implemented  using  the  top  down  parser  generated  by  the 
TACOS  compiler  system.   Actions  are  written  in  a  pi/1  program  approximately 
1000  statements  long.   SOL  execution  time  routines  are  also  written  in  pi/1. 


185 

The  SOL  compiler  generates  Pi/1  programs  which  simulate  the  actions  intended 

by  the  input  SOL  program.   These  generated  pi/1  programs  are  lengthy  and 

Pi/1  compiler  is  terribly  slow.   It  is  worthwhile  to  try  an  interpretive 

version  of  SOL,  which  is  useful  in  an  interactive  SOL. 

Sample  SOL  program 

TRNS:   PROC  (PTR1 , BNAME) ; 

%/*   THIS  PROCEDURE  WILL  CREATE  ALL  THE  BRANCHES  IMPLIED  BY  THE 

TRANSITIVITY  OF  THE  BRANCHES  WITH  LABEL  "BNAME"  IN  THE  GRAPH  POINTED 
TO  BY  PTR1  */ 

DCL   (PTR1,PTR2,B1,B2)  POINTER, 

( BNAME, NAME, NAME  1) CHAR (8);    % 
FOR      (ALL,B,B0F(PTR1)); 
NAME   =    NAME(B)  ; 
~       IF         'NAME    =    BNAME'       THEN    DO; 
PTR2    =   HEAD(B); 
FOR      (ALL,BL,0UTBR(PTR2)); 
NAME1    =    NAME  (131); 
IF         'NAME1    =   BNAME'       THEN 

ADD      BR      BNAME  (TAIL (Bl), HEAD  (Bl))      TO   PTR1 ; 
,END;END; 
END 
END   TRNS 


186 


Syntax   used    in   current   implementation 

Sol-program::   =   {pAl-- statement  [sol- statements}* 

pil-statements:  :    =  70[P -^-statement  }*  % 

sol -statement: :  =  labels{block  (declaration [association  |operation |if-statement 

[else-statement [assignment |go- to-statement [end-statement 

|call-statement  ]; 
block::  =  BEGIN  |  [PROCEDURE  JPROC  }[(<*!>{,  <*I>}*]  I  DO 

labels:  :  =  [<*I>:  }* 

declaration::    =    [DECLARE lDCL }[integer] <*I >attribute 

{ , [integer ] <*I >attribute }* 
attribute::    =    [GRAPH | GR }[UNORIENTED ] | 
{SUBGRAPH | sg}[dis JOINT] I 

[node J nd] I 

[BRANCH[BR}({<*I>|node-ref},(<*I>[node-ref  i)  | 

[set  |st]  | 
[pointer  i ptj 

association::  =  associate [ free 

associate::  =  [PTASOC  |STASOC [NDASOC |BRASOC |SGASOC |GRASOC } 

[(element-ref[,element-ref j*)]P#L-declaration-tail; 

Pil-declaration-tail:.:  =  PJ&l-element-dcl{,P£L-element-dcl  }* 
free::  =  [PTFREE |STFREE [nDFREE |bRFREE |SGFREE |GRFREE  } 

[ (element-re f [ , element-ref }*) ] <*I >[ , <*I >  3* 
set-ref::  =  sets-set |node-set |graph-set | [SET |st] <*I > |pointer-ref 
sets-set::  =  set-set-mnemonic  (set-ref , set-ref ) 
set-set-mnemonic::  =  UNION | INTER |DIF |SYMDIF 

node-ref::  =  branch-node | [NODE |nd]  qualified-name |pointer-ref 
branch-node::  =  [HEAD | TAIL  }  (branch-re f) 


187 

branch-ref::    =    [BRANCH |brJ <*I >(node-ref ,node-ref ) 

[BRANCH  I BR]qualified-name] pointer-ref 
qualified-name::    =    (*l){.<*I>3* 
node-set::    =    {ADJBR|lNCBR J OUTBR } (node-re f) 
graph-ref::    =   [gRAPH|gr] <*l > jpointer-ref 
pointer-ref : :    =    [POINTER |pt] <*I > 

graph-set::    =    [NOF | BOF  }( {graph-ref | subgraph-ref }) 

subgraph-ref : :    =    [SUBGRAPH |sc]qualif ied-name[DIS JOINT] j pointer-ref 
operation::    =   loop-control | data-operation | pointer-operation j node-operation 

j  subgraph-operation 
element-operation::    =   sets-set | sets-boolean |graph-set |node-set j branch-node 

jset-integer (element-string 
sets-boolean::    =    [EQUALS |SUBSET  }(set-ref, set-ref) 
set-integer::    =   CARD(set-ref ) 
element-string::    =    [NAME |TYPE  }  (element-ref ) 
element-ref::    =   pointer-ref | set-ref |node-ref (branch-ref 

j subgraph-ref | graph-ref 
loop-control::    =   FOR({ANY  |ALL  j  integer  },  <*I  >,  set-ref ) 
data-operation::    =  add-operation  |delete-operation 
add-operation::    =  ADD   element-ref   TO  element-ref 
delete-operation::    =    [DELETE |DEL ]element-ref   FROM  element-ref 
pointer-operation::    =    [ JUMP JMOVE |OMOVE jpointer-ref [ [branch-ref |node-ref  } 

| label-name] | {M0VE2 |0M0VE2  jpointer-ref 
[[branch-ref | label-name j[[node-ref | label-name j] 

|[node-ref | label-name j[ [branch-ref | label-name j]] 
label -name::    =    [LABEL |l j <*I > 


188 

node-operation::    =    [PARTITION | GENERATE  }node-ref[ INTO 3 

[graph-ref |subgraph-ref }  AS (*I ); 
subgraph-operation::    =   BACKUP  node-ref | [MERGE | PARSE |DISCONNECT| SINK | 

SOURCE [LINK }set-re£ 
[ [label-name |branch-re£ }][AS  <*I >] ; 
*go    to-statement: :    =    [GO  TO|gOTO}<*I> 

end-statement::    =  END   [ <*I >] 
^call-statement::    =   CALL    <*I  >[(  <*I  >,  {,  <*I  >}*)  ] 

*if -statement: :    =  IF   character-string  THEN  if-tail[else-statement] 
*if-tail::    -    [pil-statement |sol-statement }* 
*else-statement: :    =  ELSE  if-tail 

assignment: :    =   set-assignment  J equ-as si gnment jassoc-assignment Jnor-assignment 
set-assignment::    = . set-re £  =    (element-ref [,element-ref }*) 
equ-assignment: :    =  #  (*I  >  =   qualified-name    . 
assoc-assignment: :    =  qualified-name    (element-ref) 

[(P.J&l-index{,.PJ&l-  index}*)]  = 
[qualified-name (element-ref) [(Pil-index 
[,pil -index}*)]  [  P#L  -expression  } 
nor-assignment:  :    =    <*I  >[  (pil-index[  ,P^l-index  }*)  ]   = 

[element-operation |qualified-name (element-ref) 
[(pil-index[,Pil-index}*)]|type  qualified-name  | 
pil- expression } 
type::    =    [POINTER |pt} | [SET | ST  }  | [NODE |nd}  | [BRANCH | BR  }  | [SUBGRAPH |sg] | [GRAPH |gr} 
Note:     All   the   phrases  whose   name   starts  with  P#L  are   defined  as   their 
counterpart    in   P^/l. 

<*I >   is   an   identifier,    and   is   defined   as    the    identifiers    in  PA/1.      We 
allow  maximum  of   8   characters    for   each   identifier. 


189 
Integer  is  a  decimal  whole  number. 

Character-string  is  a  string  of  any  combination  of  all  permissible  characters 
(including  double  quotation  mark)  between  two  quotation  marks. 

Reserved  Words 

ADD,  ADJBR,  ALL,  ANY,  AS 

BEGIN,    BETWEEN,    BOF,    BR,    BRANCH,    BRASOC,    BRFREE 

CALL,    CARD 

DCL,  DECLARE,  DEL,  DELETE,  DIF,  DISJOINT,  DISCONNECT,  DO 
ELSE,  END,  EQUALS 
FOR,  FROM 

GENERATE,  GO,  GO  TO,  GR,  GRAPH,  GRASOC ,  GRFREE 

HEAD 

IF,  INCBR,  INTER,  INTO 

JUMP 

L,  LABEL,  LINK 

MERGE,  MOVE,  MOVE2 

NAME,  ND,  NDASOC,  NDFREE,  NODE,  NOF 

OMOVE,  0M0VE2,  OUTBR 

PARSE,  PROC,  PROCEDURE,  PT,  PTASOC,  PTFREE ,  POINTER,  PARTITION 

SET,  SG,  SGASOC,  SGFREE,  SINK,  SOURCE,  ST,  STASOC,  STFREE,  SUBSET,  SYMDIF 

TAIL,  THEN,  TO,  TYPE 

UNION,  UNORIENTED 


190 


APPENDIX  B 
ADDITIONAL  EXAMPLES 


Here  we  have  included  additional  examples  of  scene  analysis.   Fig.  B.l 
shows  a  rather  complex  scene.   Fig.  B.2  is  its  graphical  representation. 
In  the  following  pages,  the  SOL  program  which  created  this  graph  and  its 
associations  is  given,  and  it  is  followed  by  the  computer  output  of  the 
results  of  this  analysis. 

Fig.  B.3(a)  displays  a  scene  (chair)  whose  regions  are  divided,  by 
occlusion  or  additional  structure,  into  several  sub-regions.   A  T- joint 
merging  procedure  will  merge  these  regions  and  conform  the  input  to  our 
model  graph.   Then  the  chair  is  recognized  normally.   The  analysis  results 
are  shown  in  the  following  pages. 


191 


vX<       V*»    \X«\*.      .*/   VO>      \A> 


Bl 


14 


16 
24     23 


-t 

15 


25     26 


21 

22 

20    17 


rV 

r 
A 

19        18 


78        77 


Fig.  B.l.   A  more  complex  scene  example 


192 


N20 


N43 


N27 


N33, 


N22, 


'lg.  B.2.   Graphical  representation  of  scene  in  Fig.  B.l 


OF  A  SCENE*/ 


NONAME 

PAR*  =  DISJOINT 
EXAM:  PROC  CPTIGNS(MAIN) 
*   /*  THIS  IS  AN  EXAMPLE 

OCL  GRAPH  POINTER  EXTERNAL, 
SMOOTH  ENTRY(POINTER)  , 

SEE  BIT(i)  EXTERNAL  , 
SRCTRL  POINTER  CONTROLLED 
DCL  RECOG  ENTRY(DEC  FIXED,CEC 
ALLOCATE  SRCTRL  ;  SRCTRL^NULL 
ALLOCATE  SRCTRL  ;  % 
OCL  PIC9  GR  , 
ABOVE  BR  (N1,N4)  , 
ROF  BR(N2,N1)  , 
ABOVE  BP(Ni,Nll), 
ABOVE  BR  (N2,Nil)  , 
ROF  BR(N3,N2)  , 
LOF  BR(N6,N11)  , 
CONTAIN  BR  (N6,N5)  , 
CONTAIN  BR  (N6,N10)  , 
NEXT  BR(N4,N6)  , 
•NEXT  8R(N7,N6)  , 
NEXT  BR(N8,N6)  , 
NEXT  BR(N9,N6)  , 
INF  BR(N15,N11)  , 
D3ELOW  BR(N15,N14I  , 
QA80VE  BR(N15,N16)  , 
ADJ  BR(N24,N23)  t 
ADJ  8R(N24,N25)  r 
ADJ  BR(\'26,N25)  , 
ADJ  BR(N26,N23)  , 
ADJ  BR(N21,N22)  , 
ADJ  BR(N17,N18J  , 
ADJ  BP(M7,N20)  , 
ADJ  BR(N19,N18)  , 
ADJ  BR(N19,N20)  , 
BR 
BR 


EXTERNAL 
FIXED)  ; 


CONTAIN 

CONTAIN 

CONTAIN 

CONTAIN 

CONTAIN 

CONTAIN 

CONTAIN 

CONTAIN 

CONTAIN 

CONTAIN 

LOF  BR 

ROF 

INB 

IN8 

LOF 

INF 

INF 

LOF 

INF 


( M 1 6 ,  N 1 7  I 

(N16,N18) 
BR  (N16,NI9) 
BR  (N16,N20) 
BR  <N16,N21) 
BR  (N16,N22) 
BR  (N16,N23) 
BR  (Nl6,N24) 
BR  (N16,N25J 
BR  (N16,N26) 
(N27,N16)  r 
BR  <N33,N16I  , 
BP(N28,N29)  t 
BR(N30,N31|  , 
BR  (N29,N42) 
(N35,N34) 
(N37,N36) 
(N71.N37) 
(N43,N16) 
OABOVE  BR  (N27,N28) 
DABOVE  BR  (N29fN30) 


BR 
BR 
BR 

BR 


193 

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
U 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
21 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 


194 


OABOVE 
DABOVE 
DABOVE 
OABOVE 


BR  (N31tN32l 
BR(N33,N34) 
BR  (N35,N36l 
BR  (N37,N38) 


OLOF  BR  (N42,N39)  , 
DROF  BR  (N43,N39)  , 
INSIDE  BR  (N40,N39) 


BR 
BR 
BR 
BR 
BR 
BR 
BR 


INSIDE 

INSIDE 

INSIDE 

INSIDE 

INSIDE 

DBELOW 

DABOVE 

NEXT  BR 

NEXT 

NEXT 

NEXT 

NEXT 

NEXT 

NEXT 

NEXT 

ROF  BR 

DLOF  BR 

DLOF  BR 

INSIDE 

INSIDE 

INSIDE 

INSIDE 

INSIDE 

INSIDE 

DABOVE 

DABOVE 

NEXT  BR 

NEXT 

NEXT 

NEXT 

NEXT 

NEXT 

NEXT 

NEXT 

HOLD 

NEXT 


BR  (N41,N39) 

BR  (N44,N39) 

BR  (N45,N39) 

BR  (N46,N39) 

BR  (N47.N39) 

BR  (N48.N39) 

BR  (N48.N491 

(N50,N49) 

(N52fN49) 

(N54,N49) 

(N55,N49) 

(N50,N51) 

(N52,N53) 

(N55,N56) 

(N54.N57) 

(N6i,N53) 

(N61,N58) 

(N58,N62) 


BR 
BR 
KR 
BR 
BR 
BR 
BR 
BR 
BR 


BR  (N59,N58)  , 
BR  (N60.N58)  ♦ 
BR  (N63fN58)  , 
BR  (N64,N58)  , 
BR  (N65tN58)  f 
BR  (N66,N58)  t 
BR  (N58,N67)  , 
BR  (N67,N68)  , 

(N68fN69) 

(N68.N70) 

(N68.N75) 

(N68.N76) 

(N69,N72) 

(N70tN71) 

(N76,N78) 

(N75.N77) 

(N71,N73) 

(N73.N74)  ; 
(NQFCPIC9))  1  SHAPE  ,  2  SHP  DEC  FIXED, 


NDASOC 

2  LTOW  DEC  FIXED  (5,2) 
SHAPE. SHP  (PIC9.Nl  1=22 
SHAPE. SHP  (PIC9.N2  )=22 
SHAPE. SHP  (PIC9.N3  )=22 
SHAPE. SHP  (PIC9.N4  )=3 
SHAPE. SHP  (PIC9.N5  )=4 
SHAPE. SHP  (PIC9.N6  )=13 
SHAPE. SHP  (PIC9.N7  )=3 
SHAPE. SHP  (PIC9.N8  ) =3 
SHAPE. SHP  (PIC9.N9  )=3 
SHAPE. SHP  (PIC9.NlO)=i 
SHAPE. SHP  (PIC9.N11)=28 
SHAPE. SHP  (PIC9.N14)=1 


58 

59 

60 

61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 

81 

82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 

100 

101 

102 

103 

104 

105 

106 

107 

108 

109 

110 

ill 

112 

113 

114 


195 


SHAPE. SHP 
SHAPF.SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPF.SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
Shadp. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE .SHP 
SHAPE. SHP 
SHAPE. SHO 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPE. SHP 
SHAPF.SHP 
SHAPE. SHP 


(PIC9.N15)=8 

(PIC9.N16)=1 

(PIC9.N17)=1 

(PIC9.N18)=1 

(PIC9.N19)=1 

(PIC9.N20)=1 

(PIC9.N2i)=l 

(PIC9.N22)=1 

(PIC9.N23)=1 

(PIC9.N24)=1 

(PIC9.N25)=L 

(PIC9.N26)=1 

(PIC9.N27)=18 

(PIC9.N28)=1 

(PIC9.N29)=18 

(PIC9.N30)=1 

(PIC9.\31)=18 

(PIC9.N32)=1 

( PIC9.N33)=19 

<PIC9.N34)=8 

(PIC9.N35)=19 

(PIC9.N36)=8 

(PIC9.N37)=19 

(PIC9.N38)=8 

(PIC9.N39)=4 

(PIC9.N40)=27 

(PIC9.N4l)=27 

( PIC9.N42)=24 

(  PIC9.N43)=24 

(PIC9.N44)=4 

(DiC9.N45)=4 

(PIC9.N46)=13 

<PIC9.N47)=8 

<  PIC9.N48)=29 

(PIC9.N49|=7 

( PIC9.N50)=8 

( PIC9.N51 )=23 

(PIC9.N52)=8 

( PIC9.N53)=23 

< PIC9.N54)=8 

( PIC9.N55)=8 

(PIC9.N56)=9 

(PIC9.N57)=9 

UMC9.N58)=4 

(PIC9.N59)=27 

( PIC9.N60)=27 

( PIC9.N61)=24 

(PIC9.N62)=24 

(PIC9.N63)=4 

( PIC9.N64)=4 

(PIC9.N65)=13 

(PIC9.N66)=8 

( PIC9.N67)=29 

(PIC9.N68)=7 

( PIC9.N69)=8 

(PIC9.N70)=8 

(PIC9.N71 )=23 


115 

116 

117 

118 

119 

120 

121 

122 

123 

124 

125 

126 

127 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 

141 

142 

143 

144 

145 

146 

147 

148 

149 

150 

151 

152 

153 

154 

155 

156 

157 

158 

159 

160 

161 

162 

163 

164 

165 

166 

167 

168 

169 

170 

171 


196 


SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 

SHAPE 


.SHP 

.SHP 

.SHP 

.SHP 

.SHP 

.SHP 

.SHP 

.LTOW< 

.LTOWt 

.LTOW( 

.LTOWt 

.LTOW( 

.LTOWt 

.LTOW( 

.LTOWt 

.LTOW( 

.LTOWt 

.LTOWt 

.LTOW< 

.LTOWt 

.LTOWt 

.LTCWt 

.LTOW( 

.LTOW( 

.LT0W( 

.LTOWt 

.LTOW( 

.LTOWt 

.LTOWt 

.LTCW( 

•LTOWt 

.LTOWt 

.LTOVM 

.LTOWt 

.LTOWt 

.LTOW< 

.LTOW( 

.LTOW( 

.LTCWt 

.LTOW( 

.LTOWt 

.LTOW( 

. LTOWt 

.LTOW( 

.LTCWt 

.LTOW( 

.LTOWt 

.LTOWt 

.LTOW( 

•LTCWt 

.LTOW< 

.LTOW( 

.LTOWt 

.LTOWt 

.LTOWt 

.LTOWt 

.LTOWt 


PIC9.N72) 

PIC9.N73) 

PIC9.N74I 

PIC9.N75) 

PIC9.N76) 

PIC9.N77) 

PIC9.N78) 

PIC9.N1  ) 

PIC9.N2 

PIC9.N3 

PIC9.N4 

PIC9.N5 

PIC9.N6 

PIC9.N7 

PIC9.N8 

PIC9.N9 

PIC9.N10) 

PIC9.N11) 

PIC9.N14) 

PIC9.N15) 

PIC9.N16) 

PIC9.N17) 

PIC9.N18) 

PIC9.N19) 

PIC9.N20) 

PIC9.N21) 

PIC9.N22I 

PIC9.N23) 

PIC9.N24) 

PIC9.N25) 

PIC9.N26) 

PIC9.N27) 

PIC9.N28) 

PIC9.N29) 

PIC9.N30) 

PIC9.N3L) 

PIC9.N32) 

PIC9.N33) 

PIC9.N34) 

PIC9.N35) 

PIC9.N36) 

PIC9.N37) 

PIC9.N38) 

PIC9.N39) 

PIC9.N40) 

PIC9.N41) 

PIC9.N42) 

PIC9.N43) 

PIC9.N44I 

PIC9.N45) 

PIC9.N46) 

PIC9.N47) 

PIC9.N48) 

PIC9.N49) 

PIC9.N50) 

PIC9.N5L) 

PIC9.N52) 


=23; 
=6  ; 
*l  ; 

=  8  ; 
*8  ; 
=9  ; 
=9  ; 

=  5.6 
=6.7 
=4.3 
=  3.5 
=  2.1 
=  7.3 
=3.5 
=  2.3 
=  2.3 
=  1.8 
=  1.7 
=  1.4 
=  4.5 
=  1.4 
=  1.1 
=  1.1 
=  1.1 
=  1.1 
=4.5 
=  4.5 
=  1.2 
=  1.2 
=  1.2 
=  1.2 
=  1.2 
=  7.3 
=  1.2 
=  7.3 
=  1.2 
=7.3 
=4.7 
=  5.2 
=4.7 
=  5.2 
=4.7 
=  5.2 
1.2 
=4.3 
=  4.3 
=  3.5 
=  3.5 
=  1.5 
=  1.5 
=3.3 
=  1.8 
=  1.4 
=  1.3 
=  7.6 
=  1.7 
=  7.6 


172 
173 
174 
175 
176 
177 
178 
179 
180 
181 
182 
183 
184 
185 
186 
187 
188 
189 
190 
191 
192 
193 
194 
195 
196 
197 
198 
199 
200 
201 
202 
203 
204 
205 
206 
207 
208 
209 
210 
211 
212 
213 
214 
215 
216 
217 
218 
219 
220 
221 
222 
223 
224 
225 
226 
227 
228 


SHAPE.LTCW(PIC9.N53)=1.7 

SHAPE.LTQW(PIC9.N54)=5.4 

SHAPE. LTOW( "I C9.N55) =  5.4 

SHAPE.LT0W<PIC9.N56)=3.1 

SHAPE.LTGW(PIC9.N57)=3.1 

SHAPE.LT0W(PIC9.N58)=1.2 

SHAPE.LTCW(PIC9.N59)=4.3 

SHAPE.LTGW(PIC9.N60)=4.3 

SHAPE.LTCW(PIC9.N61)=3.5 

SHAPE. LTCW(°IC9.N62) =3.5 

SHAPE.LT0W(PIC9.N63)=1.5 

SHAPE.LT0W(PIC9.N64)=1.5 

SHAPE.LT0W(PIC9.N65)=3.2 

SHAPE. LTCW(PIC9.N66>  =  1.8 

SHAPE.LTCW(PIC9.N67)=1.4 

SHADE.LT0W(PIC9.N68)=1.3 

SHAPE.LT0W(PIC9.N69)=7.5 

SHAPE.LT0MPIC9.N70)  =  7.5 

SHAPE.LT0W(PIC9.N7l)=l.7 

SHAPE.LTCW(PIC9.N72»=1.7 

SHAPE. LT0W(PIC9.N73) =11.3 

SHAPE.LTGW(PIC9.N'74)  =  1.8 
SHAPE.LT0W(PIC9.N75)=5.4 

SHAPE.LTGW(PIC9.N76)=5.4 

SHAPE.LTG«(PIC9.N7  7)=3.i 

SHAP!=.LTCW(PIC9.N78)=3.l 

SRCTRL=PIC9.N15; 

GRAPH=PIC9  ; 

CALL  SMCCTH(GRAPH)  ; 

SEE='0»B  ; 

CALL  RFCGG(0,0)  ;     % 

END; 


197 


229 

230 

231 

232 

233 

234 

235 

2  36 

237 

238 

239 

240 

241 

242 

243 

244 

245 

246 

247 

248 

249 

250 

251 

252 

253 

254 

255 

256 

257 

258 

259 

260 


198 


5.BIEE  QE£££IEHQU  Q£  IU£  SLBi£. 

I«E  EDJLLOWIL'fi  DfiJECJLS   MERE  £AR££JD  JB  IME  ^C£^£" 

1.  MOUNTAIN 

2.  CLOUD 

3.  CLOUD 

I.  PLANE 

5.  HOUSE 

6.  CLOUD 

7.  TREE 

8.  TPFE 

9.  MAN 
10.  TREE 

II.  TPEE 

12.  MAN 

13.  TREE 

14.  TPEE 

15.  RAG 

END  Q£  SEIEE  QESCBJLBIIQN. 


199 
fULL    DfS££I£IION!    0£    Ijjf    S_C_£.NJF . 

IH£   EfLLCWl^G   Sltt££SS£iJL  St££S   *£££   lAMEfj   ifi,   £AaSI!ifi   Iid£   £Q£Ni£. 
object  *pag**. 

PEOION    *N73**    IS    -HE    ATTENTION    POINT    OF    PARSING    THE    ABOVE    StEpS. 
OBJECT    *TPPF**. 

RFr,,nN    ,N37^     !s     THF    iTTFNTIW    pnINT    rp     pARSING    the    tBwE    5TEp^ 
OBJECT    *TPFE**. 

REGION    *N31**    IS     -HE    ATTENTION    POINT    of    PARSING    THF    AROVE    STEPS. 
OBJECT    *MAN**. 
°RJECT    *FA<"F**. 

RFGION    *Nfl**    IS    the    ATTENTION    POINT    FF    PARSING    THE    4BOVE    STEPS. 
OBJECT    *tpff**. 

REGION    *N3«**     IS    THE    ATTENTION    POINT    pF    PARSING    ^HE    ARrv/E    STEPS. 
°RJF^T    *TPFF**. 

PEHION    *N29**     IS     THE    ATTENTION    POINT    OF    dAPSTNG    rH[    *pCVF     STEPS# 
OBJECT    *m^nja*# 


n 


RJECT    *FACE**. 


RFGION  *N<3**  I?  thf  ATTENTION  POINT  FF  PARSING  t,HE  ABOVE  STEPS. 
OBJECT  *TREF**. 

RFGION  *N">3**  IS  thf  AT-ENTTFN  PCIKT  OF  PARSING  THE  ABOVE  STEPS. 
OBJECT  *TPEE**. 

REGION  nj?7**  IS  the  ATTENTION  POINT  nF  pAPSrNG  THE  ABOVE  STEPS. 

object  *ri_oiin)**. 

REGION  *N3**  IS  THF  ATTENTICM  POINT  rF  pArSInG  the  ABOVE  STEoS. 
object  *Hru<;F**. 

OBJECT  Aprnp** 


«fif,inN  *Nl**.  IS  THF  ATTENTION  POINT  of  p4PSINr,  THE  „WE  STpps 
PRJETT  *Pt  ANE**. 


.  200  • 

REGION  *N6**  IS  THE  ATTENTICN  POINT  OF  PARSING  THE  ABOVE  STEPS. 

HRJECT  *CLOUD**. 

REGION  *N2**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *CLCUD**. 

REGION  *N1**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS. 

OBJECT  *MOUNTAIN**. 

REGION    *N11**    IS    THE    ATTENTION    POINT    OF    PARSING   THE    ABOVE    STEPS, 

IH1S  CONCLUDED  Ifcif  BJEEQEtl  D£  1H£  £il£££iSEUL  iXIfHEIS- 


201 
IHt  BMSfD  RI£IQ£I£L  I^EQEJ^IIOJM  IS   AS  £QU.CH£: 

A       ^MOUNTAIN**    TS    FOUND    IN    THE    SCENE. 

IIS  &£LAIICtfS   ID   ibE   fJlh!£E  JE£Rjr_S   D£  Ud£  S££U£   A££   AS   £D.LLQUS: 
THIS    *MOUNTAIN**     IS    LOCATED    BELOW    THE       *CLOUD**. 
THIS    *!»GUNTAIN**    IS    LOCATED    BELOW    THE    SFCOND       *CLOUD**. 
THIS    ^MOUNTAIN**     is    LOCATED    AT    THE    RIGHT    OF    THE       *PLANE**. 
TMis    ^MOUNTAIN**     IS    LOCATED    IN    BEHIND    OF     THE       *HDUSE**. 

EUD  Q£   ELE'LAIICMAL  D.£SC.RI£IlCii  £0R    JiJlS  2EJ££I. 

A       *CLCUD**    IS    FOUND    IN    THF     SCENE. 

IIS  E£UII£jNS  12   ItiE   Dly££   £MIS   ££  IHJE   £C£^E   £?_£    AS.   £2LLQW£: 

THIS    *CLOUD**  TS     LOC-TFD    ABCVE    tHF      *MCWNTAIN**. 

THIS    *CL0UD**  IS    LOCATED    AT    The    LEFT    OF    the    SECONO      *CLCUD**. 

THIS     +CLOUD**  TS     LOCATED    ARCVE    THE       *PLANE**. 

END  Qf   R£L/IICML   DFSCEIEI1C.N  ££p   jjjlS   Q5Jf.CZ. 

A    SECOND       *CLOUD**    IS    FOUND    IN    THE    SCENE. 

IIS  PELATILWS   IC   ItiJE  ClhfEE  PMIS  DE   IJd£  S££N£   Alt   AS   EOLLQHS: 

THIS    *CLOUD**    IS    LOCATED    ARCVE    THE       ^MOUNTAIN**. 

THIS    *CLHUO**    IS    LOCATED    AT    THE    RIGHT    0F     THE       *CL0UD**. 

THIS    *CLOUD**     IS    LPCA-FD    AT    THE    LEFT    CF    THF    TH!RD         *CLDUD**, 

EfclD  Q£   EELAJlQiviL   Q£SC£I£IXQJ  £OE  IJdIS  C£Jf£Z. 
A       *PLANF**     IS    FOUND     IN    tHf    SCENE. 


IIS  RELAI1C&S  ID  Itf£   DltiEfi   PARTS  C£  IH£  S££NE   AE£  AS  EQLLQHS: 
THIS    *PLANE**    IS    LOCATED    BELOW    THE      *CLOUD**.  i  202 

THIS    *PLANE**    IS    LOCATED    AT    THE    LEFT    OF    THE      *MOUNTAIN**. 

£MQ  Q£  EELAIIQN&L  QESCElETICtt  £Q£  ItiLS  Q£Jf£I. 

A       *HOUSE**    IS    FOUND    IN    THE    SCENE, 

IIS  afLAIiQNS  IQ  IME  DIJd£E  EAEIS  C£  ItiJE  S££N£   AE£  AS  EQLLQHS: 

THIS    *HOUSE**  IS  LOCATED    IN   FRONT    OF    THE      *HOUNTAIN**. 

THIS    *MCUSE**  IS  LOCATED    AT    THE    RIGHT    OF    THE       *TREE**. 

THIS    *HOUSE**  IS  LOCATED    AT    THE    LEFT    OF   THE    SECOND      *TREE**. 

THIS  *HOUSE**  IS  LOCATED  IN  BEHIND  OF  THE   *MAN**. 

END  Q£  RELAI1DHAL  QES££I£II££  £OR  IHiS  QB_1££I. 

A  THIRD    *CLOUD**  IS  FOUND  IN  THE  SCENE. 

IIS  RELAIID8S  IQ  XUE  QIb£E  BASIS  Of  Iii£  SEEUE  AE£  AS  EQLLQHS: 
THIS  *CLOUD**  IS  LOCATED  AT  THE  RIGHT  OF  THE  SECOND   *CLOUD**. 

EfclQ  Q£  BELAIIQflAL  QESCBI£II£H  ££R  IblJLS  GEJECI. 

A   *TRFE**  IS  FOUND  IN  THE  SCENE. 

IIS  EELAIIQNS  IQ  IM£  QIfc££  £A£I£  QE   IU£  S££N£  AE£  AS  £QLlCldS: 
THIS  *TREE**  IS  LOCATED  AT  THE  LEFT  OF  THE   *HOUSE**. 
THIS  *TREE**  IS  LOCATED  IN  BEHIND  OF  THE  THIRD    *TPEE**. 

END  QE  EELAIIQUAL  QESCRIEIICB  £QE  IHIS  0£JE£I. 

A  SECOND   +TRFE**  IS  FOUND  IN  THE  SCENE. 

IIS  EELAIIQLJS  IQ  Iti£  QlbER  JEAEIS  ££  IM£  £££N.£  AE£  AS  EOLLQES: 


THIS    *TPEE**    IS    LOCATFD    AT    THE    RIGHT    OF    THE       *HOUSE**. 
HIS     *T<?FE**    IS    LOCATED    IN     BEHIND    OF    THE    FOURTH       *TREE**.' 

tm  qf  afiuncjbiAi  Q£scai£iim  jecr  mis  objici. 


203 


A       *MAN**    IS    FOUND    IN    the    SCENE. 

IIS  BELAUDS   10   JHE  £IH£P.  fMIS  DF  Tj£   <>££*£   M£  ^  £o_LLaws_' 
THTS    *MAN**    IS    LOCATED    IN    FFONT    OF    THE      *FnuSE**. 
THIS    *mAN**    IS    LOCATED    AT    TFE    RIGHT    OF    THE    THIRD  *TREE**. 

THIS     *MftN**    IS    LOCATED    AT    T>E    LEFT    OF    THE     SECOND       **AN**. 

iUD  Q£   EL£1AII£NAL   QESCBIJEIIOI  ££R   IblS   D£J££I. 

THTRD         *t«EE**    IS    FOUNO    IN    THE    SCENE. 

US   &EL AIICNS    IL1   lti£    CIL1££  PARIS  D£   IH£   S££N£    A££   AS.    FnLLQ!is_: 

•HIS    *T*FE**     IS    LOCATED    IN  FPONT    OF    THE       *TREE*«. 

HIS    *TREE**    IS    LOCATED    AT  -HE    LEFT    OF    tHF       *MAN**. 

THIS    *^FE**     IS    LOCATED    IN  PEHIND    OF    the    FIFTH         *TPEE**. 

ENQ   n£   HELAI1QNAL   DZSLUEUSh  ££p   ih,I£  CJUEC1. 

A    FOURTH       *TREE**     IS     FOUND     IN    THE    SCENE. 

IIS   ££LAIir^5    IQ    IH£   nIjaEa    pMI,    Q£   Ijy£    <L£N£    A££    AS   £qLLQMSs 
THIS    *TPFF**     IS    LOCATED    IN    FPCNT     OF    THE    SECOND       *TqEF**. 
THIS    *tpff*^     IS    LOCATED    IN     PEHIND    OF    THE     SIXTH  *TPFE**. 

END   pf   ££L1ILDNAL   D£S££1£I1£N    F£R   IH.1S   C£J££I. 


A    SFr.OMD       *«AN**     IS    FOUND     IN    tHF    SCFNE. 

US   REUIICU5   IC    Ifcj£   2IhJ££   £A£I£    ££   IH£   S££M£    AP£   AS.   EQLLQiiS: 


204 


THIS    *MAN**    IS    LOCATED    AT    THE    RIGHT    OF    THE       *MAN**. 

THIS    *MAN**    IS    LOCATED    AT    THE    LEFT    OF    THE    SIXTH         *TREE**. 

THE    RFLATION    *HOLO**    BETWEEN    THE    *MAN**    AND   THE      *BAG 
IS    NOT    KNOWN    TO    THE    RECOGNIZER. 

EfiD  Q£  E£LAIIDML  Q£S£El£UXfcl  £Q£  lillS  CBJf£I. 

A    FIFTH         *tree**    IS    FOUND    IN    THE    SCENE. 

IIS  E£LAI10L!S  IC  Ib£   QIH££   £ASIS  DE  IH£  S££M£  AE£  AS  fQLiQHS.: 
THIS    *TREE**    IS    LOCATED    IN    FRCNT    OF    THE    THIRD         *TPEE«*. 

E&Q  Qf  E£l£IID£iAL  QiSCEIEIim  £GE  I&LS  CIJ£CI- 

A    SIXTH         *TREE**    IS    FOUND    IN    THE    SCENE. 

IIS  £EUI1C*JS  IQ  IH£  OIHEB  £AEIS  Q£  LH£  SQ£U£   A££   AS  £QLLQHS_'. 
THIS    *TREE**    IS    LOCATED    IN    FRONT    OF    THE    FOURTH      *TREE**. 
T"is    *TSEF**    IS    LOCATED    AT    THE    RIGHT    OF    THE    SECOND      *MAN**. 

£&Q  Q£   EELAJIflflAL   QESCB1EUXU  £D£  IttLS  ££J££I. 

A      *BAG**    IS    FOUND    IN    THE    SCENE. 

IIS  P£LAI1C*JS  IQ   IflE  QIU££  £AEIS  Q£  IHf  S££N£  AE£  AS  EQLLQMS^ 

THE    RFLATION    *HOLD**    BETWEEN    THE    *BAG**    AND    THE    SECOND       *MAN 
IS    NO^    KNOWN    TO    THE    RECOGNIZER. 

END  QE  R£LAIICfc!A.L   D_ESC.£I£I1£N  £QE  ItilS  0&i£Ct. 

END  Q£  S£EN£   DJESCE12IIQM. 


205 


1 

2 

/ 

3 

/                     / 
/           4          /-- 

/                c 

/         6             /        \ 

•-7 

--9 

1-8 

-10 

(a) 

picture 

12 


(b)   input  graph 


Fig.  B.3.   A  scene  example  with  divided  regions, 


206 


NCDES  N2  AND  Nl  ARE  MERGEC  INTO  ONE  NODE, NAMED  GENII 

NODES  N4  AND  N3  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN12 

NCDES  N5  AND  N6  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN13 

NODES  Ntt  AND  N7  ARE  MERGED  INTO  ONE  NODE, NAMED  GENU 

NCDES  NiO  AND  N9  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN15 

NCDES  NI2  AND  Nil  ARE  MERGED  INTO  ONE  NODE, NAMED  GEN16 

NODES  GEN12  AND  GEN13  ARE  MERGED  INTO  ONE  NCDE, NAMED  GEN17 


207 


&£!££   Q£S££JLEJieu  Q£  It£  £££££. 

It£  ££i.LLJalIJSl£  &B*L££I£  *£££   EA&SEL  lb  Ib£  &Q£u£ 
1.    CHAIR 

£412  ££  ££!££   C£££fiI£IIfiU. 


208 


Ib£  ELLLIisIUG  SL£I£*S£liJ.  SI££S  htRt  IAKftt  1H  tAtelUZ  Itik  £££*£. 

OBJECT    *CHAIP**. 

REGICN  *G£M1**  IS  THE  ATTENTION  POINT  OF  PARSING  THE  ABOVE  STEPS 

Itjli  CQUCUiJ2££  lt£  *£££&!  ££  IM£  iii£L£ii£iil  AII£H£IS. 


209 
IH£  £AB£££  21LIQ&1AL  IhtQ&hAllQh  IS  AS   ZQLLQH&* 

A      *CHAIK*»    IS    FGIMD    IN    THE    SCENE. 

US  S£UIl£Ai  Ifl  JttE  fllUfiB  J21BIS  fi£  l££  *££*£  ^  ^  fOllfi^: 

£AA  ££  a£lAII£m  Q^L&ltUQh  £££  Ibli   IfiJ££I. 

££J2  ££   £££&£   C£S£Bi£IIjQM. 


210 


VITA. 

Ahmad  Eftekhari  Masumi  was  born  in  Teheran,  Iran,  on  December  11, 
1944.   He  was  accepted  as  a  foreign  student  under  the  Japanese  Government 
scholarship  to  Japan  and  graduated  from  the  University  of  Tokyo,  Japan, 
with  a  Bachelor  of  Engineering  in  Electronics  in  1968.   Early  in  1970  he 
finished  his  graduate  studies  at  the  University  of  Tokyo  and  received  a 
masters  degree  in  Electrical  Engineering.   He  continued  his  research  at 
the  same  university  until  later  in  1970. 

From  1970  to  1973  he  was  a  research  assistant  in  the  Department  of 
Computer  Science  at  the  University  of  Illinois. 

Mr.  Masumi  is  a  student  member  of  the  Association  for  Computing 
Machinery.   He  is  also  a  professional  member  of  the  Institute  for  Electronics 
and  Communication  Engineers  of  Japan,  and  Information  Processing  Society  of 
Japan. 


F°rm.fi/^r427  US'  AT0M,C  ENERGY  COMMISSION 

AECM3201  UNIVERSITY-TYPE  CONTRACTOR'S  RECOMMENDATION   FOR 

DISPOSITION  OF  SCIENTIFIC  AND  TECHNICAL  DOCUMENT 


(See  Instructions  on  Reverse  Side  ) 


1.    AEC  REPORT  NO. 

coo-2118-00^9 


2.    TITLE 

Picture  Analysis  "by  Graph  Transformation 


3.  TYPE  OF   DOCUMENT    (Check  one): 

2J  a.  Scientific  and  technical  report 

n  b.  Conference  paper  not  to  be  published  in  a  journal: 

Title  of  conference 

Date  of  conference 


Exact  location  of  conference 

Sponsoring  organization 

□  c.  Other  (Specify) 


1.    RECOMMENDED  ANNOUNCEMENT  AND  DISTRIBUTION    (Check  one): 

[XJ  a.  AEC's  normal  announcement  and  distribution  procedures  may  be  followed. 

D  b.  Make  available  only  within  AEC  and  to  AEC  contractors  and  other  U.S.  Government  agencies  and  their  contractors. 

LJ  c.  Make  no  announcement  or  distrubution. 


REASON   FOR    RECOMMENDED   RESTRICTIONS: 


SUBMITTED   BY:      NAME    AND  POSITION    (Please  print  or  type) 

Ahmad  E .  Masumi 
Research  Assistant 


Organization 

Department   of  Computer  Science 

University  of  Illinois  at  Urbana-Champaign 

Urbana,    Illinois     6l801 


Signature 


Date 

October  1973 


FOR   AEC   USE   ONLY 

AEC  CONTRACT  ADMINISTRATOR'S  COMMENTS,   IF    ANY.  ON   ABOVE    ANNOUNCEMENT  AND   DISTRIBUTION 
RECOMMENDATION: 


PATENT  CLEARANCE: 


Q  a.  AEC  patent  clearance  has  been  granted  by  responsible  AEC  patent  group. 
U  b.   Report  has  been  sent  to  responsible  AEC  patent  group  for  clearance. 
U  c.  Patent  clearance  not  required. 


BIBLIOGRAPHIC  DATA 
SHEET 

4.  Title  and  Subtitle 


1.   Report  No. 

UIUCDCS-R-73-604 


Picture  Analysis  by  Graph  Transformation 


',  Author(s) 

Ahmad  E .  Masumi 


I.  Performing  Organization  Name  and  Address 

Department  of  Computer  Science 

University  of  Illinois  at  Urbana-Champaign 

Urbana,    Illinois     61801 


2.  Sponsoring  Organization  Name  and  Address 

US  AEC  Chicago  Operations  Office 
9800  South  Cass  Avenue 
Argonne,    Illinois     60^39 


5.  Supplementary  Notes 


3.  Recipient's  Accession  No. 


5-   Report  Date 

October  1973 


8.   Performing  Organization  Rept. 
No-     UIUCDCS-R-73-60U 


10.   Project/Task/Work  Unit  No. 

US  AEC  AT f 11-1 12 lift 


11.  Contract/Grant  No. 

i+6-26-l5-303 


13.  Type  of  Report  &  Period 
Covered 


14. 


i.  Abstracts  T        .-,.         ,,  .  "™  ' ■ — ______ 

information  represented1!;  T^TlJ??}?"  *  f  *hodo1^  *°  analyze  the  pictorial 
resent  in  ttaU^  ^IV^lrlTiZt  £%?££?-  °^tS  "°"  "** 

^catenati'g  iTlrZilVelTJ^T)   £*«*««»••  sentences  which  are  forced  hy 

rSnro/^rSio:-  £-*-- ^^^^^^^^^^ 

ictures  as  a  collection  'of  L«Z  ?  syntactical  properties  of  one  class  of 

3  easily  modified   (rotated)   to  SS  ?       I         &S  &  collection   <*  graphs  can 

*  object       It  hLfl?rf  the   System  to  recognize  different  projections   of 

«i3y       teodSible   ?o  ?hee^,tnCOUr,aginS  t0   n0te   that  ^i-tical  information   is 

troducible   to  the   system  and  xmmensely  improves   the  performance  of  the   system. 


COSATI  Field/Group 
;WaHabi|,ty  S,.„,m,„t 

(Unlimited  Release 

1  NTIS-1B   110-70) 


19.  Security  Class  (This 
Report) 

■  UNCLASSIFIED 

20.  Security  (  lass  (This 


Page 

UNCLASSIHKD 


21.  No.  of  Pages 

213 


22.  Price 


USCOMM-DC    40329-P71 


0#» 


s* 


<b 


UNIVERSITY  OF  ILLINOI9-URBANA 


3  01 


12  047417826 


